I am trying to build a list of length = 120, which shall include 4 numbers. The tricky thing is that the numbers shall not appear in a row and each number should occur exactly to the same amount.
This is my script.
import random
List = [1,2,3,4]
seq_CG = random.sample(List,len(List))
for i in range(121):
randomList = random.sample(List, len(List))
if randomList[0]!=seq_CG[-1]:
seq_CG.append(randomList)
i = len(seq_CG)
print List, randomList, seq_CG
I am pretty close. However something does not work. And maybe there is even a shorter and more random solution?
In the big list seq_CG i do not want that numbers appear in a row. In my example it is filled with a lot of smaller lists. However it would be even nicer to have a random list of 120 numbers with an equal distribution of each number, where numbers do not appear in a row.
Here are a couple of solutions.
The first algorithm maintains an index idx
into the sequence and on each call idx
is randomly modified to a different index so it's impossible for a yielded value to equal the previous value.
from random import randrange
from itertools import islice
from collections import Counter
def non_repeating(seq):
m = len(seq)
idx = randrange(0, m)
while True:
yield seq[idx]
idx = (idx + randrange(1, m)) % m
seq = [1, 2, 3, 4]
print(''.join(map(str, islice(non_repeating(seq), 60))))
ctr = Counter(islice(non_repeating(seq), 12000))
print(ctr)
typical output
313231412323431321312312131413242424121414314243432414241413
Counter({1: 3017, 4: 3012, 3: 2993, 2: 2978})
The distribution of values produced by that code looks fairly uniform, but I haven't analysed it mathematically, and I make no guarantees as to its uniformity.
The following code is more complex, but it does give a uniform distribution. Repeated values are not discarded, they are temporarily added to a pool of repeated values, and the algorithm tries to use values in the pool as soon as possible. If it can't find a suitable value in the pool it generates a new random value.
from random import choice
from itertools import islice
from collections import Counter
def non_repeating(seq):
pool = []
prev = None
while True:
p = set(pool).difference([prev])
if p:
current = p.pop()
pool.remove(current)
else:
current = choice(seq)
if current == prev:
pool.append(current)
continue
yield current
prev = current
seq = [1, 2, 3, 4]
print(''.join(map(str, islice(non_repeating(seq), 60))))
ctr = Counter(islice(non_repeating(seq), 12000))
print(ctr)
typical output
142134314121212124343242324143123212323414131323434212124232
Counter({4: 3015, 2: 3005, 3: 3001, 1: 2979})
If the length of the input sequence is only 2 or 3 the pool can get quite large, but for longer sequences it generally only holds a few values.
Finally, here's a version that gives an exactly uniform distribution. Do not attempt to use it on an input sequence of 2 (or fewer) elements because it's likely to get stuck in an infinite loop; of course, there are only 2 solutions for such an input sequence anyway. :)
I'm not proud of this rather ugly code, but at least it does the job. I'm creating an output list of length 60 so that it fits nicely on the screen, but this code has no trouble generating much larger sequences.
from random import shuffle
from itertools import groupby
from collections import Counter
def non_repeating(seq, copies=3):
seq = seq * copies
while True:
shuffle(seq)
result, pool = [], []
for k, g in groupby(seq):
result.append(k)
n = len(list(g)) - 1
if n:
pool.extend(n * [k])
for u in pool:
for i in range(len(result) - 1):
if result[i] != u != result[i + 1]:
result.insert(i+1, u)
break
else:
break
else:
return result
# Test that sequence doesn't contain repeats
def verify(seq):
return all(len(list(g)) == 1 for _, g in groupby(seq))
seq = [1, 2, 3, 4]
result = non_repeating(seq, 15)
print(''.join(map(str, result)))
print(verify(result))
print(Counter(result))
typical output
241413414241343212423232123241234123124342342141313414132313
True
Counter({1: 15, 2: 15, 3: 15, 4: 15})