algorithmrandominteger-partition

Uniformly randomly generate a vector of k unsigned ints that sums to N


Another phrasing is: randomly partition N identical items into k buckets, allowing some buckets to be empty.

For this discussion:

I'd like to write a function f(N,k) that randomly and uniformly selects amongst the possible vectors of length k that partition N and returns the selected vector.

It would be nice if there was a solution that worked for all k>=1, but I'm particularly interested in k > N. So if it helps to focus or limit to that condition, that is okay. And if we have to delve into approximations/heuristics, it's okay to consider k large enough that most of the vector entries must be zero (so at least k > 2N).


My initial thoughts are:

  1. If N is small enough that it is reasonable to calculate (or look up in a table?) the number of integer partitions of N, then maybe we could proceed as:
    • Create a vector of k unsigned ints initialized to zero
    • Make a random integer partition of N. Let m be the length of this tuple.
    • Place those values in the initial m positions of the vector.
    • Randomly shuffle the vector.

This would naively treat it as equally likely for the output vector to have one entry containing N being just as likely as N entries containing 1. That isn't correct. But maybe there is a simple weighting that can be applied to "Make a random integer partition of N" which would correct for this?

  1. Another approach which feels cleaner, but would likely still need "re-weighting" somewhere:
    • Create a vector of k unsigned ints initialized to zero
    • do the following N times:
      • randomly choose an element of the vector and increment it

While this feels cleaner to start, I think this would be much messier to try to "re-weight". While the weights for part 1 sounds like a difficult algorithm question to me, I can at least imagine what needs to be calculated. Here, I'm not even sure what needs to be reweighted and how.

The reason I think it likely still needs reweighting is that there is exactly one sequence of random choices that would lead to the vector looking like [N,0,0,...,0], and N! sequences of random choices that would lead to the vector starting with N ones [1,1,...,1,0,0,...,0]. Calculating the ratios of these "incorrect weighting" of the final result sounds doable, but I don't know how I'd go about reweighting the individual steps to correct for it.

  1. Or maybe there is another approach entirely, that I have not thought of?

Solution

  • Generate a random int in the range 0-n k-1 times. Treat these as partitions of [1, 1, ..., 1] <- size n. Then the sums between partitions (& endpoints) are your vector.

    E.g., n=2, k=5:

    say we get 0, 1, 1, 2

    we think of this as: [|1||1|]

    which we interpret as [0, 1, 0, 1, 0] (treating gaps as zeroes).

    If we had 0,0,2,2 instead, we'd have [||1,1||] or [0,0,2,0,0]

    Here's Ruby code for this:

    def f(n, k)
      arr = [0]
      ans = []
      (k-1).times do
        arr.append(rand(n+1))
      end
      arr.append(n)
      arr.sort!
      1.upto(k) do |i|
        ans.append(arr[i] - arr[i-1])
      end
      return ans
    end
    

    Running time is O(k log k) because of the sort. We might be able to avoid the sort by generating the random numbers in sorted order.

    -- update --

    This is not uniform. Here are a million runs of f(4,2)

    1_000_000.times do
      m[f(2,4)] += 1
    end
    => 1000000
    > m
    => 
    {[1, 0, 1, 0]=>205646,
     [0, 1, 1, 0]=>411624,
     [0, 0, 1, 1]=>205916,
     [0, 1, 0, 1]=>205144,
     [0, 0, 2, 0]=>205961,
     [0, 0, 0, 2]=>68718,
     [1, 0, 0, 1]=>68178,
     [0, 2, 0, 0]=>205736,
     [2, 0, 0, 0]=>68275,
     [1, 1, 0, 0]=>205783}
    

    --- update ---

    Stars and bars works. Here's Ruby code and another million run:

    def g(n,k)
      arr = [1]*n + [0]*(k-1) # 1's represent stars (what we're counting), and 0's represent bars (separators)
      arr.shuffle!
      ans = []
      sum = 0
      arr.each do |val|
        if val == 0
          ans.append(sum)
          sum = 0
        else
          sum += val
        end
      end
      ans.append(sum)
    end
    
    1_000_000.times do
      m[g(2,4)] += 1
    end
    => 1000000
    > m
    => 
    {[1, 1, 0, 0]=>99977,
     [0, 2, 0, 0]=>100150,
     [1, 0, 0, 1]=>100201,
     [0, 1, 1, 0]=>100034,
     [0, 1, 0, 1]=>99422,
     [0, 0, 0, 2]=>99865,
     [2, 0, 0, 0]=>99662,
     [1, 0, 1, 0]=>100359,
     [0, 0, 1, 1]=>100332,
     [0, 0, 2, 0]=>99998}