pythonpython-3.xnumpynumpy-random

Confusion regarding the inner workings of NumPy's SeedSequence


In case it matters at all, I'm using Python 3.11.5 64-bit on a Windows 11 Pro desktop computer with NumPy 1.26.4.

In order to try to better understand what NumPy is doing behind the scenes when I ask for a np.random.Generator object from some given SeedSequence, I decided to try to reconstruct in pure Python what happens when I initialize a SeedSequence from a given entropy value.

Based on the source code for SeedSequence found here, my understanding of how uint32 overflow works, and the fact that (on my machine at least) np.dtype(np.uint32).itemsize is 4, i.e. XSHIFT, defined as np.dtype(np.uint32).itemsize * 8 // 2, is 16, I wrote the following code:

seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
    Assembled_entropy.append(Ent & 0xffffffff)
    Ent >>= 32
if not Assembled_entropy:
    Assembled_entropy = [0]

hash_const = 0x43b0d7e5
for i in range(Pool_size):
    if i < len(Assembled_entropy):
        Assembled_entropy[i] ^= hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        Assembled_entropy[i] *= hash_const
        Assembled_entropy[i] &= 0xffffffff
        Assembled_entropy[i] ^= Assembled_entropy[i] >> 16
        Pool[i] = Assembled_entropy[i]
    else:
        value = hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        value *= hash_const
        value &= 0xffffffff
        value ^= value >> 16
        Pool[i] = value
for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            Pool[i_src] ^= hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            Pool[i_src] *= hash_const
            Pool[i_src] &= 0xffffffff
            Pool[i_src] ^= Pool[i_src] >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * Pool[i_src]) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
    for i_dst in range(Pool_size):
        Assembled_entropy[i_src] ^= hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        Assembled_entropy[i_src] *= hash_const
        Assembled_entropy[i_src] &= 0xffffffff
        Assembled_entropy[i_src] ^= Assembled_entropy[i_src] >> 16
        x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
        y = (0x4973f715 * Assembled_entropy[i_src]) & 0xffffffff
        Pool[i_dst] = x - y
        Pool[i_dst] &= 0xffffffff
        Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)

I have copied the shell outputs of some test runs below.

Please enter a seed: 0
[595626433, 3558985979, 200295889, 3864401631, 3155212474, 198111058, 4047350828, 373757291]
Please enter a seed: 1
[2396653877, 491222160, 2441066534, 3196981647, 1764919720, 3210735412, 1132315803, 1197535761]
Please enter a seed: 123456789
[2161290507, 266876805, 2694113549, 3306969538, 3218948428, 3543586554, 886289367, 3129292100]
Please enter a seed: 123456789123456789
[2628723507, 610487362, 209721652, 1960674985, 3519121735, 1259052354, 2097159984, 3934338599]
Please enter a seed: 123456789123456789123456789123456789
[2988668238, 798946769, 2484899198, 1005350017, 2633831484, 343737596, 1402961265, 3184558744]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[431881030, 3789410928, 218849910, 879851040, 1423068736, 85390627, 3721593143, 198649564]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[702225118, 2293461530, 514808704, 2115883586, 3179647446, 3197133803, 3807436730, 1822195906]

from numpy.random import SeedSequence
seed = int(input('Please enter a seed: '))
seedseq = SeedSequence(entropy=seed, spawn_key=[], pool_size=8, n_children_spawned=0)
print([int(value) for value in seedseq.pool])

However, providing those same values to the above version of the program, which calls NumPy's SeedSequence directly, gives very different results:

Please enter a seed: 0
[2043904064, 467759482, 3940449851, 2747621207, 4006820188, 4161973813, 800317807, 2622167125]
Please enter a seed: 1
[476219752, 3923368624, 2653737542, 2876255837, 1861759290, 3300511046, 3253139541, 2224879358]
Please enter a seed: 123456789
[480462800, 1421661229, 2686834002, 3365909768, 3295673516, 1830753151, 1249963727, 3680881655]
Please enter a seed: 123456789123456789
[3112345096, 1618497203, 2864025213, 3262672577, 379697145, 163816190, 1265228116, 2568065655]
Please enter a seed: 123456789123456789123456789123456789
[2197723902, 2868273012, 1547285866, 2772382071, 2016971656, 1130152919, 897020445, 135618137]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[3230290517, 251217303, 1180998335, 454107561, 4150025399, 1840013050, 1216833737, 89665521]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[902839167, 3446715647, 2106916613, 1578536987, 595141342, 3126308643, 400300642, 3659109886]

What is going on here?



UPDATE: based on @OskarHoffman's answer, I have fixed my code. It is included here in case anybody is interested.

seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
    Assembled_entropy.append(Ent & 0xffffffff)
    Ent >>= 32
if not Assembled_entropy:
    Assembled_entropy = [0]

hash_const = 0x43b0d7e5
for i in range(Pool_size):
    if i < len(Assembled_entropy):
        temp = Assembled_entropy[i] ^ hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        temp *= hash_const
        temp &= 0xffffffff
        temp ^= temp >> 16
        Pool[i] = temp
    else:
        value = hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        value *= hash_const
        value &= 0xffffffff
        value ^= value >> 16
        Pool[i] = value
for i_src in range(Pool_size):
    for i_dst in range(Pool_size):
        if i_src != i_dst:
            temp = Pool[i_src] ^ hash_const
            hash_const *= 0x931e8875
            hash_const &= 0xffffffff
            temp *= hash_const
            temp &= 0xffffffff
            temp ^= temp >> 16
            x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
            y = (0x4973f715 * temp) & 0xffffffff
            Pool[i_dst] = x - y
            Pool[i_dst] &= 0xffffffff
            Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
    for i_dst in range(Pool_size):
        temp = Assembled_entropy[i_src] ^ hash_const
        hash_const *= 0x931e8875
        hash_const &= 0xffffffff
        temp *= hash_const
        temp &= 0xffffffff
        temp ^= temp >> 16
        x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
        y = (0x4973f715 * temp) & 0xffffffff
        Pool[i_dst] = x - y
        Pool[i_dst] &= 0xffffffff
        Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)

Solution

  • The difference is in your second for-loop implementing the hashmix() function. You modify your Pool list at position i_src to calculate the value for y. The numpy implementation does not. It just copies the value Pool[i_src] (by using it as an argument for calling the hashmix function) and modifies that copy (discarding it afterwards).

    So modifying that for-loop to:

    for i_src in range(Pool_size):
        for i_dst in range(Pool_size):
            if i_src != i_dst:
                # work with new variable instead of modifying Pool[i_src]
                temp = Pool[i_src] ^ hash_const
                hash_const *= 0x931e8875
                hash_const &= 0xffffffff
                temp *= hash_const
                temp &= 0xffffffff
                temp ^= temp >> 16
                x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
                y = (0x4973f715 * temp) & 0xffffffff
                Pool[i_dst] = x - y
                Pool[i_dst] &= 0xffffffff
                Pool[i_dst] ^= Pool[i_dst] >> 16
    
    

    I get the same results as the numpy-implementation.