pythonlinuxnumpyrandomnumpy-random

`numpy.random.normal` generates different numbers on different systems


I'm comparing the numbers generated with np.random.normal on two different systems (details see below) with the following code (I'm using the legacy np.random.seed because this is used by another program whose output I want to verify eventually) (1) :

import numpy as np

np.random.seed(0)
x = np.random.normal(scale=1e-3, size=10**5)
np.save('test.npy', x)

Then I copy the test.npy from one system to the other where I compare the two versions:

>>> other = np.load('test.npy')
>>> (x != other).sum(), len(x)
(29, 100000)
>>> mask = x != other
>>> np.abs(x[mask] - other[mask])
array([5.42101086e-20, 1.35525272e-20, 2.71050543e-20, 5.42101086e-20,
       1.08420217e-19, 1.08420217e-19, 2.16840434e-19, 2.16840434e-19,
       1.35525272e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
       2.71050543e-20, 1.08420217e-19, 2.16840434e-19, 5.42101086e-20,
       2.71050543e-20, 2.16840434e-19, 2.16840434e-19, 2.71050543e-20,
       2.71050543e-20, 1.08420217e-19, 1.08420217e-19, 1.08420217e-19,
       5.42101086e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
       2.71050543e-20])
>>> x[mask]
array([ 4.52489093e-04,  9.78961454e-05, -1.47113076e-04, -3.67859222e-04,
       -5.33279620e-04,  8.40794952e-04, -7.75987295e-04,  1.34205479e-03,
        6.34459482e-05,  5.07109360e-04, -7.68363366e-04,  3.33350262e-04,
       -2.19367067e-04,  6.11402140e-04, -1.30486526e-03, -4.42699624e-04,
        1.45463287e-04, -1.22491651e-03,  1.05226781e-03, -2.43032730e-04,
       -2.40551279e-04,  4.95396595e-04, -7.25454745e-04, -8.50779215e-04,
       -2.66274662e-04,  7.28854386e-04,  8.38515107e-04,  3.36152654e-04,
       -1.26550328e-04])

So there are 29 out of 100,000 elements that differ by a tiny amount. However, I do not understand where this difference comes from. I verified that I have the same version of Python and NumPy installed on both systems: python==3.9.4 and numpy==1.20.2 (obtained via python -m pip install numpy==1.20.2; but I also checked with the latest version numpy==1.23.0 and the results are exactly the same). I verified that the RNG state (via np.random.get_state()) is the same on both systems before and after the call to np.random.normal. I saved and copied the test.npy file multiple times and I also verified it via MD5 checksum, so the difference must originate in the random number generation itself (1). However, I can't see how this is possible, given that both are initiated with the same random state.

Information about systems

System A (the one where test.npy is saved):

$ uname -a
Linux SystemA 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

(I also tested on another system A2 which has the same OS version installed as A but is equipped with different CPUs, but the results didn't change from A to A2, i.e. I suspect it's the OS version).

System B (the one where test.npy is loaded):

$ uname -a
Linux SystemB 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Footnote (1): When I use the recommend way given in the docs of np.random.seed, i.e. rs = RandomState(MT19937(SeedSequence(0))), I find that the discrepancy between the two systems remains. However, when I use np.random.default_rng(seed=0) instead, i.e. the new PCG64, I find that the discrepancy goes away.


Solution

  • Given that the differences are all so small, it suggests that the underlying bit-generators are doing the same things. It's just to do with differences between the underlying math library.

    NumPy's legacy generator uses sqrt and log from libm, and you can see that it pulls in these symbols by first finding the shared object providing the generator via:

    import numpy as np
    
    print(np.random.mtrand.__file__)
    

    then dumping symbols with:

    nm -C -gD mtrand.*.so | grep GLIBC
    

    where that mtrand filename comes from the above output.

    I get a lot of other symbols output, but that might explain the differences.

    At a guess it's to do with the log implementation, so you could test with:

    import numpy as np
    
    np.random.seed(0)
    
    x = 2 * np.random.rand(2, 10**5) - 1
    r2 = np.sum(x * x, axis=0)
    
    np.save('test-log.npy', np.log(r2))
    

    and compare between these two systems.