Algorithm to select multiple non-overlapping subsequences of given sequence

You have been given the input data (a sequence of items), for example:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

and your task is to randomly select M non-overlapping samples (subsequences) of the same size N.

For example, if the task was to select 3 samples of size 3, one of the solutions would be:

[3, 4, 5]
[8, 9, 10]
[11, 12, 13]

The samples are unordered so [8, 9, 10], [3, 4, 5], [11, 12, 13] is the same solution. All solutions are expected to have an equal probability of being selected.

My algorithm:

Select randomly first sample: [11, 12, 13]
Remaining input data are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and [14, 15, 16].
Select randomly second sample from the remaining input data: [3, 4, 5].
Remaining input data (big enough) are [6, 7, 8, 9, 10] and [14, 15, 16].
Select randomly third sample from the remaining input data: [8, 9, 10].

Sadly, this algorithm does not work when the samples are too big. If the task was to select 3 samples of size 5, there exists a solution, but if you use my algorithm and select randomly the first sample as [3, 4, 5, 6, 7], the algorithm will fail.

Of course there is also an obvious brute-force algorithm: find all possible solutions and select randomly one of them. But I was hoping for something more "clever" (time and space efficient).

Solution

import random

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
M = 3
N = 3

dividers = sorted(random.sample(range(len(seq) - M*N + M), M))
starts = [d + i*(N-1) for i, d in enumerate(dividers)]
samples = [seq[start : start+N] for start in starts]
print(samples)

Example output (Attempt This Online!):

[[3, 4, 5], [10, 11, 12], [13, 14, 15]]

Idea is that if you want 3 subsequences of size 3 from overall 16 items, then 7 elements are not in the subsequences. They're instead in the 4 gaps before/after the subsequences. Each gap can be empty. So find 4 nonnegative integers with sum 7 and use them as gap sizes. How to do that was inspired by this answer.

Alternative explanation: If we had N=1, this would simply be sampling M single items. But in general, each of them is accompanied by the next N-1 items to make up a subsequence. So we basically do sample(range(len(seq)), M) for the N=1 case, but for the general case shrink the range by M*(N-1) to allow room for the M times N-1 accompaniers, which we then add.

I tried choosing samples 120000 times and counting how often each starts tuple occurred. Each of the 120 possibilities occurred about 1000 times as expected:

120
((0, 3, 6), 1030)
((0, 3, 7), 1007)
((0, 3, 8), 956)
...
((6, 9, 13), 1010)
((6, 10, 13), 1006)
((7, 10, 13), 986)

Code:

import random
from collections import Counter

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
M = 3
N = 3

ctr = Counter()
for _ in range(120_000):
    dividers = sorted(random.sample(range(len(seq) - M*N + M), M))
    starts = [d + i*(N-1) for i, d in enumerate(dividers)]
    ctr[*starts] += 1

print(len(ctr))
for x in sorted(ctr.items()):
    print(x)

Attempt This Online!