pythonarraysnumpymemorynumexpr

What could be the best way to bypass `MemoryError` in this case?


I have two numpy arrays of pretty large size. First is arr1 of size (40, 40, 3580) and second is arr2 of size (3580, 50). What I want to achieve is

arr_final = np.sum(arr1[..., None]*arr2, axis = 2)

such that the size of arr_final is just (40, 40, 50). However, in doing the above, python probably caches internal array operations, so I keep on getting memory error. Is there any way so as to avoid internal caching and just have final result? I have looked at numexpr, but I am not sure how one can achieve arr1[..., None]*arr2, and then sum over axis=2 in numexpr. Any help or suggestion would be appreciated.


Solution

  • Assuming you meant np.sum(arr1[..., None]*arr2, axis = 2), with a ... instead of a :, then that's just dot:

    arr3 = arr1.dot(arr2)
    

    This should be more efficient than explicitly materializing arr1[..., None]*arr2, but I don't know exactly what intermediates it allocates.

    You can also express the computation with einsum. Again, this should be more efficient than explicitly materializing arr1[..., None]*arr2, but I don't know exactly what it allocates.

    arr3 = numpy.einsum('ijk,kl', arr1, arr2)