I'm comparing the numbers generated with np.random.normal
on two different systems (details see below) with the following code (I'm using the legacy np.random.seed
because this is used by another program whose output I want to verify eventually) (1) :
import numpy as np
np.random.seed(0)
x = np.random.normal(scale=1e-3, size=10**5)
np.save('test.npy', x)
Then I copy the test.npy
from one system to the other where I compare the two versions:
>>> other = np.load('test.npy')
>>> (x != other).sum(), len(x)
(29, 100000)
>>> mask = x != other
>>> np.abs(x[mask] - other[mask])
array([5.42101086e-20, 1.35525272e-20, 2.71050543e-20, 5.42101086e-20,
1.08420217e-19, 1.08420217e-19, 2.16840434e-19, 2.16840434e-19,
1.35525272e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
2.71050543e-20, 1.08420217e-19, 2.16840434e-19, 5.42101086e-20,
2.71050543e-20, 2.16840434e-19, 2.16840434e-19, 2.71050543e-20,
2.71050543e-20, 1.08420217e-19, 1.08420217e-19, 1.08420217e-19,
5.42101086e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20,
2.71050543e-20])
>>> x[mask]
array([ 4.52489093e-04, 9.78961454e-05, -1.47113076e-04, -3.67859222e-04,
-5.33279620e-04, 8.40794952e-04, -7.75987295e-04, 1.34205479e-03,
6.34459482e-05, 5.07109360e-04, -7.68363366e-04, 3.33350262e-04,
-2.19367067e-04, 6.11402140e-04, -1.30486526e-03, -4.42699624e-04,
1.45463287e-04, -1.22491651e-03, 1.05226781e-03, -2.43032730e-04,
-2.40551279e-04, 4.95396595e-04, -7.25454745e-04, -8.50779215e-04,
-2.66274662e-04, 7.28854386e-04, 8.38515107e-04, 3.36152654e-04,
-1.26550328e-04])
So there are 29 out of 100,000 elements that differ by a tiny amount. However, I do not understand where this difference comes from. I verified that I have the same version of Python and NumPy installed on both systems: python==3.9.4
and numpy==1.20.2
(obtained via python -m pip install numpy==1.20.2
; but I also checked with the latest version numpy==1.23.0
and the results are exactly the same). I verified that the RNG state (via np.random.get_state()
) is the same on both systems before and after the call to np.random.normal
. I saved and copied the test.npy
file multiple times and I also verified it via MD5 checksum, so the difference must originate in the random number generation itself (1). However, I can't see how this is possible, given that both are initiated with the same random state.
System A (the one where test.npy
is saved):
$ uname -a
Linux SystemA 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(I also tested on another system A2 which has the same OS version installed as A but is equipped with different CPUs, but the results didn't change from A to A2, i.e. I suspect it's the OS version).
System B (the one where test.npy
is loaded):
$ uname -a
Linux SystemB 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Footnote (1): When I use the recommend way given in the docs of np.random.seed
, i.e. rs = RandomState(MT19937(SeedSequence(0)))
, I find that the discrepancy between the two systems remains. However, when I use np.random.default_rng(seed=0)
instead, i.e. the new PCG64
, I find that the discrepancy goes away.
Given that the differences are all so small, it suggests that the underlying bit-generators are doing the same things. It's just to do with differences between the underlying math library.
NumPy's legacy generator uses sqrt
and log
from libm
, and you can see that it pulls in these symbols by first finding the shared object providing the generator via:
import numpy as np
print(np.random.mtrand.__file__)
then dumping symbols with:
nm -C -gD mtrand.*.so | grep GLIBC
where that mtrand
filename comes from the above output.
I get a lot of other symbols output, but that might explain the differences.
At a guess it's to do with the log
implementation, so you could test with:
import numpy as np
np.random.seed(0)
x = 2 * np.random.rand(2, 10**5) - 1
r2 = np.sum(x * x, axis=0)
np.save('test-log.npy', np.log(r2))
and compare between these two systems.