In case it matters at all, I'm using Python 3.11.5 64-bit on a Windows 11 Pro desktop computer with NumPy 1.26.4.
In order to try to better understand what NumPy is doing behind the scenes when I ask for a np.random.Generator
object from some given SeedSequence
, I decided to try to reconstruct in pure Python what happens when I initialize a SeedSequence
from a given entropy value.
Based on the source code for SeedSequence
found here, my understanding of how uint32
overflow works, and the fact that (on my machine at least) np.dtype(np.uint32).itemsize
is 4, i.e. XSHIFT
, defined as np.dtype(np.uint32).itemsize * 8 // 2
, is 16, I wrote the following code:
seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
Assembled_entropy.append(Ent & 0xffffffff)
Ent >>= 32
if not Assembled_entropy:
Assembled_entropy = [0]
hash_const = 0x43b0d7e5
for i in range(Pool_size):
if i < len(Assembled_entropy):
Assembled_entropy[i] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Assembled_entropy[i] *= hash_const
Assembled_entropy[i] &= 0xffffffff
Assembled_entropy[i] ^= Assembled_entropy[i] >> 16
Pool[i] = Assembled_entropy[i]
else:
value = hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
value *= hash_const
value &= 0xffffffff
value ^= value >> 16
Pool[i] = value
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
Pool[i_src] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Pool[i_src] *= hash_const
Pool[i_src] &= 0xffffffff
Pool[i_src] ^= Pool[i_src] >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * Pool[i_src]) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
for i_dst in range(Pool_size):
Assembled_entropy[i_src] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Assembled_entropy[i_src] *= hash_const
Assembled_entropy[i_src] &= 0xffffffff
Assembled_entropy[i_src] ^= Assembled_entropy[i_src] >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * Assembled_entropy[i_src]) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)
I have copied the shell outputs of some test runs below.
Please enter a seed: 0
[595626433, 3558985979, 200295889, 3864401631, 3155212474, 198111058, 4047350828, 373757291]
Please enter a seed: 1
[2396653877, 491222160, 2441066534, 3196981647, 1764919720, 3210735412, 1132315803, 1197535761]
Please enter a seed: 123456789
[2161290507, 266876805, 2694113549, 3306969538, 3218948428, 3543586554, 886289367, 3129292100]
Please enter a seed: 123456789123456789
[2628723507, 610487362, 209721652, 1960674985, 3519121735, 1259052354, 2097159984, 3934338599]
Please enter a seed: 123456789123456789123456789123456789
[2988668238, 798946769, 2484899198, 1005350017, 2633831484, 343737596, 1402961265, 3184558744]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[431881030, 3789410928, 218849910, 879851040, 1423068736, 85390627, 3721593143, 198649564]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[702225118, 2293461530, 514808704, 2115883586, 3179647446, 3197133803, 3807436730, 1822195906]
from numpy.random import SeedSequence
seed = int(input('Please enter a seed: '))
seedseq = SeedSequence(entropy=seed, spawn_key=[], pool_size=8, n_children_spawned=0)
print([int(value) for value in seedseq.pool])
However, providing those same values to the above version of the program, which calls NumPy
's SeedSequence
directly, gives very different results:
Please enter a seed: 0
[2043904064, 467759482, 3940449851, 2747621207, 4006820188, 4161973813, 800317807, 2622167125]
Please enter a seed: 1
[476219752, 3923368624, 2653737542, 2876255837, 1861759290, 3300511046, 3253139541, 2224879358]
Please enter a seed: 123456789
[480462800, 1421661229, 2686834002, 3365909768, 3295673516, 1830753151, 1249963727, 3680881655]
Please enter a seed: 123456789123456789
[3112345096, 1618497203, 2864025213, 3262672577, 379697145, 163816190, 1265228116, 2568065655]
Please enter a seed: 123456789123456789123456789123456789
[2197723902, 2868273012, 1547285866, 2772382071, 2016971656, 1130152919, 897020445, 135618137]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[3230290517, 251217303, 1180998335, 454107561, 4150025399, 1840013050, 1216833737, 89665521]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[902839167, 3446715647, 2106916613, 1578536987, 595141342, 3126308643, 400300642, 3659109886]
What is going on here?
UPDATE: based on @OskarHoffman's answer, I have fixed my code. It is included here in case anybody is interested.
seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
Assembled_entropy.append(Ent & 0xffffffff)
Ent >>= 32
if not Assembled_entropy:
Assembled_entropy = [0]
hash_const = 0x43b0d7e5
for i in range(Pool_size):
if i < len(Assembled_entropy):
temp = Assembled_entropy[i] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
Pool[i] = temp
else:
value = hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
value *= hash_const
value &= 0xffffffff
value ^= value >> 16
Pool[i] = value
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
temp = Pool[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
for i_dst in range(Pool_size):
temp = Assembled_entropy[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)
The difference is in your second for-loop implementing the hashmix()
function. You modify your Pool
list at position i_src
to calculate the value for y
. The numpy implementation does not. It just copies the value Pool[i_src]
(by using it as an argument for calling the hashmix
function) and modifies that copy (discarding it afterwards).
So modifying that for-loop to:
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
# work with new variable instead of modifying Pool[i_src]
temp = Pool[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
I get the same results as the numpy-implementation.