pythonnumpyrow-major-ordercolumn-major-order

what causes different in array sum along axis for C versus F ordered arrays in numpy


I am curious if anyone can explain what exactly leads to the discrepancy in this particular handling of C versus Fortran ordered arrays in numpy. See the code below:

system:
Ubuntu 18.10
Miniconda python 3.7.1
numpy 1.15.4
def test_array_sum_function(arr):
    idx=0
    val1 = arr[idx, :].sum()
    val2 = arr.sum(axis=(1))[idx]
    print('axis sums:', val1)
    print('          ', val2)
    print('    equal:', val1 == val2)
    print('total sum:', arr.sum())

n = 2_000_000
np.random.seed(42)
rnd = np.random.random(n)

print('Fortran order:')
arrF = np.zeros((2, n), order='F')
arrF[0, :] = rnd
test_array_sum_function(arrF)

print('\nC order:')
arrC = np.zeros((2, n), order='C')
arrC[0, :] = rnd
test_array_sum_function(arrC)

prints:

Fortran order:
axis sums: 999813.1414744433
           999813.1414744079
    equal: False
total sum: 999813.1414744424

C order:
axis sums: 999813.1414744433
           999813.1414744433
    equal: True
total sum: 999813.1414744433

Solution

  • This is almost certainly a consequence of numpy sometimes using pairwise summation and sometimes not.

    Let's build a diagnostic array:

    eps = (np.nextafter(1.0, 2)-1.0) / 2
    1+eps+eps+eps
    # 1.0
    (1+eps)+(eps+eps)
    # 1.0000000000000002
    
    X = np.full((32, 32), eps)
    X[0, 0] = 1
    X.sum(0)[0]
    # 1.0
    X.sum(1)[0]
    # 1.000000000000003
    X[:, 0].sum()
    # 1.000000000000003
    

    This strongly suggests that 1D arrays and contiguous axes use pairwise summation while strided axes in a multidimensional array don't.

    Note that to see that effect the array has to be large enough, otherwise numpy falls back to ordinary summation.