numpyprecisionapple-m1float32

np.float32 floating point differences between intel MacBook and M1


I have recently upgraded my Intel MacBook Pro 13" to a MacBook Pro 14" with M1 Pro. Been working hard on getting my software to compile and work again. No big issues fortunately, except for floating point problems in some obscure fortran code and in python. With regard to python/numpy I have the following question.

I have a large code base bur for simplicity will use this simple function that converts flight level to pressure to show the issue.

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48

    return np.where(h<=11000, \
        P0*np.exp(-g/GAMMA/R*np.log((T0/(T0-GAMMA*h) ))),\
            P11*np.exp(-g/R/T1*(h-11000)) )

When I run the code on my M1 Pro, I get:

In [2]: fl2pres(np.float64([400, 200]))
Out[3]: array([18753.90334892, 46563.239766  ])

and;

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.90234375, 46563.25080916])

Doing the same on my older Intel MacBook Pro I get:

In [2]: fl2pres(np.float64([400, 200]))
Out[2]: array([18753.90334892, 46563.239766  ])

and;

In [3]: fl2pres(np.float32([400, 200]))
Out[3]: array([18753.904296888, 46563.24778944])

The float64 calculations match but the float32 do not. We use float32 quite a lot throughout our code for memory optimisation. I understand that due to architectural differences this sort of floating point errors can occur but was wondering whether a simple fix was possible as currently some unit-tests fail. I could include the architecture in these tests but am hoping for an easier solution?

Converting all inputs to float64 makes my unit-tests pass and hence fixes this issue but sine we have quite some large arrays and dataframes, the impact on memory is unwanted.

Both laptops run python 3.9.10 installed through homebrew, pandas 1.4.1 and numpy 1.22.3 (installed to map against accelerate and blas).

EDIT I have changes the function to print intermediate values to see where changes occur:

def fl2pres(FL):
    P0=101325
    T0=288.15
    T1=216.65
    g=9.80665
    R=287.0528742
    GAMMA=0.0065
    P11=P0*np.exp(-g/GAMMA/R*np.log(T0/T1))

    h=FL*30.48
    A = np.log((T0/(T0-GAMMA*h)))
    B = np.exp(-g/GAMMA/R*A)
    C = np.exp(-g/R/T1*(h-11000))
    print(f"P11:{P11}, h:{h}, A:{A}, B:{B}, C:{C}")
    return np.where(h<=11000, P0*B, P11*C)

Running this function with the same input as above for the float32 case, I get on M1 Pro:

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161594 0.14793371], B:[0.1844504  0.45954345], C:[0.82864394 2.16691503]
array([18753.90334892, 46563.239766  ])

On Intel:

P11:22632.040591374975, h:[12192.  6096.], A:[0.32161596 0.14793368], B:[0.18445034 0.45954353], C:[0.828644 2.166915]
array([18753.90429688, 46563.24778944])

Solution

  • As per the issue I created at numpy's GitHub:

    the differences you are experiencing seem to be all within a single "ULP" (unit in the last place), maybe 2? For special math functions, like exp, or sin, small errors are unfortunately expected and can be system dependend (both hardware and OS/math libraries).

    One thing that could be would might have a slightly larger effect could be use of SVML that NumPy has on newer machines (i.e. only on the intel one). That can be disabled at build time using NPY_DISABLE_SVML=1 as an environment variable, but I don't think you can disable its use without building NumPy. (However, right now, it may well be that the M1 machine is the less precise one, or that they are both roughly the same, just different)

    I haven't tried compiling numpy using NPY_DISABLE_SVML=1 and my plan now is to use a docker container that can run on all my platforms and use a single "truth" for my tests.