pythonnumpybroadcasting

Product and sum without addtional memory allocation


Is there away to multiply two arrays and sum along an axis (or multiple axes) without allocating extra memory?

In this example:

import numpy as np

A = np.random.random((10, 10, 10))
B = np.random.random((10, 10, 10))

C = np.sum(A[:, None, :, :, None] * B[None, :, None, :, :], axis=(-1,-2))

When computing C, an intermediate matrix of size 10x10x10x10x10 is created only to be collapsed immediately. Is there a way to avoid this in numpy?


Solution

  • This looks like a dot product with the transpose of the second array:

    C = A @ B.T
    

    NB. The operation in the original question was: C = np.sum(A[:, None, :] * B[None, :, :], axis=-1).

    Quick check:

    C1 = np.sum(A[:, None, :] * B[None, :, :], axis=-1)
    C2 = A @ B.T
    
    assert np.allclose(C1, C2)
    

    You can generalize with einsum:

    np.einsum('ikl,jlm->ijk', A, B)
    

    Quick check:

    A = np.random.random((2, 3, 4))
    B = np.random.random((2, 4, 5))
    
    #             i    j   k  l    m         i   j    k   l  m
    C1 = np.sum(A[:, None, :, :, None] * B[None, :, None, :, :], axis=(-1,-2))
    C2 = np.einsum('ikl,jlm->ijk', A, B)
    
    assert np.allclose(C1, C2)