Another phrasing is: randomly partition N identical items into k buckets, allowing some buckets to be empty.
For this discussion:
I'd like to write a function f(N,k) that randomly and uniformly selects amongst the possible vectors of length k that partition N and returns the selected vector.
It would be nice if there was a solution that worked for all k>=1, but I'm particularly interested in k > N. So if it helps to focus or limit to that condition, that is okay. And if we have to delve into approximations/heuristics, it's okay to consider k large enough that most of the vector entries must be zero (so at least k > 2N).
My initial thoughts are:
This would naively treat it as equally likely for the output vector to have one entry containing N being just as likely as N entries containing 1. That isn't correct. But maybe there is a simple weighting that can be applied to "Make a random integer partition of N" which would correct for this?
While this feels cleaner to start, I think this would be much messier to try to "re-weight". While the weights for part 1 sounds like a difficult algorithm question to me, I can at least imagine what needs to be calculated. Here, I'm not even sure what needs to be reweighted and how.
The reason I think it likely still needs reweighting is that there is exactly one sequence of random choices that would lead to the vector looking like [N,0,0,...,0], and N! sequences of random choices that would lead to the vector starting with N ones [1,1,...,1,0,0,...,0]. Calculating the ratios of these "incorrect weighting" of the final result sounds doable, but I don't know how I'd go about reweighting the individual steps to correct for it.
Generate a random int in the range 0-n k-1 times. Treat these as partitions of [1, 1, ..., 1] <- size n. Then the sums between partitions (& endpoints) are your vector.
E.g., n=2, k=5:
say we get 0, 1, 1, 2
we think of this as: [|1||1|]
which we interpret as [0, 1, 0, 1, 0] (treating gaps as zeroes).
If we had 0,0,2,2 instead, we'd have [||1,1||] or [0,0,2,0,0]
Here's Ruby code for this:
def f(n, k)
arr = [0]
ans = []
(k-1).times do
arr.append(rand(n+1))
end
arr.append(n)
arr.sort!
1.upto(k) do |i|
ans.append(arr[i] - arr[i-1])
end
return ans
end
Running time is O(k log k) because of the sort. We might be able to avoid the sort by generating the random numbers in sorted order.
-- update --
This is not uniform. Here are a million runs of f(4,2)
1_000_000.times do
m[f(2,4)] += 1
end
=> 1000000
> m
=>
{[1, 0, 1, 0]=>205646,
[0, 1, 1, 0]=>411624,
[0, 0, 1, 1]=>205916,
[0, 1, 0, 1]=>205144,
[0, 0, 2, 0]=>205961,
[0, 0, 0, 2]=>68718,
[1, 0, 0, 1]=>68178,
[0, 2, 0, 0]=>205736,
[2, 0, 0, 0]=>68275,
[1, 1, 0, 0]=>205783}
--- update ---
Stars and bars works. Here's Ruby code and another million run:
def g(n,k)
arr = [1]*n + [0]*(k-1) # 1's represent stars (what we're counting), and 0's represent bars (separators)
arr.shuffle!
ans = []
sum = 0
arr.each do |val|
if val == 0
ans.append(sum)
sum = 0
else
sum += val
end
end
ans.append(sum)
end
1_000_000.times do
m[g(2,4)] += 1
end
=> 1000000
> m
=>
{[1, 1, 0, 0]=>99977,
[0, 2, 0, 0]=>100150,
[1, 0, 0, 1]=>100201,
[0, 1, 1, 0]=>100034,
[0, 1, 0, 1]=>99422,
[0, 0, 0, 2]=>99865,
[2, 0, 0, 0]=>99662,
[1, 0, 1, 0]=>100359,
[0, 0, 1, 1]=>100332,
[0, 0, 2, 0]=>99998}