I have a non-empty integer interval [a; b). I want to generate a random non-empty integer subinterval [c; d) (where a <= c and d <= b). The [c; d) interval must be uniformly distributed in the sense that every point in [a; b) must be equally likely to end up in [c; d).
I tried generating uniformly distributed c from [a; b - 1), and then uniformly distributed d from [c + 1; b), like this:
a = -100
b = 100
N = 10000
cs = np.random.randint(a, b - 1, N)
ds = np.random.randint(cs + 1, b)
But when measuring how often each point ends up being sampled, the the distribution is clearly non-unifrom:
import numpy as np
import matplotlib.pyplot as plt
hist = np.zeros(b - a, int)
for c, d in zip(cs, ds):
hist[c - a:d - a] += 1
plt.plot(np.arange(a, b), hist)
plt.show()
How do I do this correctly?
If you divide the range [a,b)
into N sections, and then select one of those sections at random, then the chance of selecting each point is uniform -- 1/N
. It doesn't matter how you divide the range.
Here's a solution along those lines that uses a pretty uniform selection of division points.
from random import randint
a, b, N = -100, 100, 1000000
intervals = []
while len(intervals) < N:
# divide the range into 3 intervals [a,x), [x,y), and [y,b)
# the distributions of the division points don't change the histogram
# we're careful here to make sure none of them are empty
p1 = randint(a+1,b-2)
p2 = randint(a+1,b-2)
x = min(p1,p2)
y = max(p1,p2)+1
# select one at random
intervals.append([(a,x),(x,y),(y,b)][randint(0,2)])