pythonnumpymemorymatrix-multiplicationdot-product

Numpy matmul requires way more memory than is necessary when inputs are of different types


I have a very large boolean 2d array (~1Gb) and I want to matmul it with a 1d array of float64. The output array would be large, but still considerably smaller than the 2d array. However, when I try to matmul them, my system freaks out and tries to allocate enough memory for the 2d array to have float64, rather than just enough for the output vector

import numpy as np
x,y=286880, 20419
a=np.random.randint(0,2,(x,y),dtype=np.bool_).reshape(x,y)
v=np.random.rand(y)
print(f'{a.shape=} {a.dtype=} {a.size * a.itemsize=}')
print(f'{v.shape=} {v.dtype=} {v.size * v.itemsize=}')
d = np.dot(a, v)
Output: 
a.shape=(286880, 20419) a.dtype=dtype('bool') a.size * a.itemsize=5857802720
v.shape=(20419,) v.dtype=dtype('float64') v.size * v.itemsize=163352
Traceback (most recent call last):
  File "...", line 7, in <module>
    d = np.dot(a, v)
numpy.core._exceptions.MemoryError: Unable to allocate 43.6 GiB for an array with shape (286880, 20419) and data type float64

Why does numpy need to allocate so much memory just for matrix multiplication? It seems like there is no reason to require additional memory for calculating a dot product, since you can just keep a running total as you iterate through the columns

Is there a time efficient and memory efficient method I can use here to get the dot product?

PS: I feel like maybe there's something extra efficient I can use here because the matrix is boolean?


Solution

  • Numpy does not internally supports operations on different types so it uses type promotion to convert the inputs so they can be of the same type first (the conversion is done out-of-place). Indeed, when Numpy executes np.dot(a, v), it first converts a of type np.bool_[:,:] to an array of type np.float64[:,:]. This is because the rule of the type promotion specify that bool_ BIN_OP float64 = float64. The thing is a is huge here so the conversion cause a big memory issue. Since np.bool_ is generally 1 byte, this cause 9 times more memory to be used, which is clearly not reasonable.

    The reason why Numpy does not support operations on different types is that it would be too complex to maintain (bigger code, much slower compilation, more complex code often resulting to more bugs) and make Numpy binaries much bigger.


    There are several solutions to address this issue.

    The simplest solution is to split the big array in chunks so that only chunks are converted. This is more complex and slower than the naive approach but the overhead compared to the naive method is small as long as the chunks are big enough. In practice, both the naive implementation and this approach are very inefficient due to the slow implicit conversion.

    An alternative solution is to use Numba or Cython so to generate a fast code (thanks to a JIT/native compiler). Numba can perform the conversion (due to the type promotion) on the fly avoiding big buffers to be allocated. This is also more efficient since the operation is more compute-bound and memory is slow: the compiler can generate a set of clever SIMD instruction so to convert parts of the boolean array efficiently to floats. Additionally, this operation is more cache-friendly since the input can better fit in cache (the smaller the better). Note that you need to use plain loops to get the benefit from using Numba/Cython: using np.dot(a, v) directly in Numba or Cython will not be better than in Numpy.

    Note that if a is sparse, that is it mainly contains zeros, then it is certainly better to use sparse matrices. Indeed, sparse matrices tends to be even more compact in memory assuming the matrices are sparse enough. This can also avoid many unneeded multiplications by zeros.