pythonpython-2.7open-sesame

Create List without similar crossovers


I am trying to build a list of length = 120, which shall include 4 numbers. The tricky thing is that the numbers shall not appear in a row and each number should occur exactly to the same amount.

This is my script.

import random
List = [1,2,3,4]
seq_CG = random.sample(List,len(List))
for i in range(121):
    randomList = random.sample(List, len(List))
    if randomList[0]!=seq_CG[-1]:
        seq_CG.append(randomList)
        i = len(seq_CG)
print List, randomList, seq_CG

I am pretty close. However something does not work. And maybe there is even a shorter and more random solution?

In the big list seq_CG i do not want that numbers appear in a row. In my example it is filled with a lot of smaller lists. However it would be even nicer to have a random list of 120 numbers with an equal distribution of each number, where numbers do not appear in a row.


Solution

  • Here are a couple of solutions.

    The first algorithm maintains an index idx into the sequence and on each call idx is randomly modified to a different index so it's impossible for a yielded value to equal the previous value.

    from random import randrange
    from itertools import islice
    from collections import Counter
    
    def non_repeating(seq):
        m = len(seq)
        idx = randrange(0, m)
        while True:
            yield seq[idx]
            idx = (idx + randrange(1, m)) % m
    
    seq = [1, 2, 3, 4]
    print(''.join(map(str, islice(non_repeating(seq), 60))))
    
    ctr = Counter(islice(non_repeating(seq), 12000))
    print(ctr)
    

    typical output

    313231412323431321312312131413242424121414314243432414241413
    Counter({1: 3017, 4: 3012, 3: 2993, 2: 2978})
    

    The distribution of values produced by that code looks fairly uniform, but I haven't analysed it mathematically, and I make no guarantees as to its uniformity.

    The following code is more complex, but it does give a uniform distribution. Repeated values are not discarded, they are temporarily added to a pool of repeated values, and the algorithm tries to use values in the pool as soon as possible. If it can't find a suitable value in the pool it generates a new random value.

    from random import choice
    from itertools import islice
    from collections import Counter
    
    def non_repeating(seq):
        pool = []
        prev = None
        while True:
            p = set(pool).difference([prev])
            if p:
                current = p.pop()
                pool.remove(current)
            else:
                current = choice(seq)
                if current == prev:
                    pool.append(current)
                    continue
            yield current
            prev = current
    
    seq = [1, 2, 3, 4]
    print(''.join(map(str, islice(non_repeating(seq), 60))))
    
    ctr = Counter(islice(non_repeating(seq), 12000))
    print(ctr)
    

    typical output

    142134314121212124343242324143123212323414131323434212124232
    Counter({4: 3015, 2: 3005, 3: 3001, 1: 2979})
    

    If the length of the input sequence is only 2 or 3 the pool can get quite large, but for longer sequences it generally only holds a few values.


    Finally, here's a version that gives an exactly uniform distribution. Do not attempt to use it on an input sequence of 2 (or fewer) elements because it's likely to get stuck in an infinite loop; of course, there are only 2 solutions for such an input sequence anyway. :)

    I'm not proud of this rather ugly code, but at least it does the job. I'm creating an output list of length 60 so that it fits nicely on the screen, but this code has no trouble generating much larger sequences.

    from random import shuffle
    from itertools import groupby
    from collections import Counter
    
    def non_repeating(seq, copies=3):
        seq = seq * copies
        while True:
            shuffle(seq)
            result, pool = [], []
            for k, g in groupby(seq):
                result.append(k)
                n = len(list(g)) - 1
                if n:
                    pool.extend(n * [k])
    
            for u in pool:
                for i in range(len(result) - 1):
                    if result[i] != u != result[i + 1]:
                        result.insert(i+1, u)
                        break
                else:
                    break
            else:
                return result
    
    # Test that sequence doesn't contain repeats
    def verify(seq):
        return all(len(list(g)) == 1 for _, g in groupby(seq))
    
    seq = [1, 2, 3, 4]
    result = non_repeating(seq, 15)
    print(''.join(map(str, result)))
    print(verify(result))
    print(Counter(result))
    

    typical output

    241413414241343212423232123241234123124342342141313414132313
    True
    Counter({1: 15, 2: 15, 3: 15, 4: 15})