pythonnumpymemoryindexingmask

Numpy memory error when masking along only certain axis, despite having sufficient RAM


I have a large array and I want to mask out certain values (set them to nodata). But I'm experiencing an out-of-memory error despite having sufficient RAM.

I have shown below an example that reproduces my situation. My array is 14.5 GB and the mask is ~7GB, but I have 64GB of RAM dedicated to this, so I don't understand why this fails.

import numpy as np

arr = np.zeros((1, 71829, 101321), dtype='uint16')
arr.nbytes
#14555572218

mask = np.random.randint(2, size=(71829, 101321), dtype='bool')
mask.nbytes
#7277786109

nodata = 0

#this results in OOM error
arr[:, mask] = nodata

Interestingly, if I do the following, then things work.

arr = np.zeros((71829, 101321), dtype='uint16')
arr.nbytes
#14555572218

mask = np.random.randint(2, size=(71829, 101321), dtype='bool')
mask.nbytes
#7277786109

nodata = 0

#this works
arr[mask] = nodata

But it isn't something I can use. This code will be a part of a library module that would need to accept a variable value for the zeroth dimension.

My guess is that arr[mask] = nodata is modifying the array in-place but arr[:, mask] = nodata is creating a new array, but I don't know why that would be the case. Even if it did, there should still be enough space for that, since the total size of arr and mask would be 22GB and I have 64GB of RAM.

I tried searching about this, I found this but I'm new to numpy and I didn't understand the explanation of the longer answer. I did try the np.where approach from the other answer to that question, but I still get OOM error.

Any input would be appreciated.


Solution

  • I suspect the issue here is that combining slice-based and mask-based indexing leads to a memory-inefficient codepath. You might try expressing it this way so that you're using entirely mask-based indexing:

    arr[mask[None]] = nodata
    

    I don't know enough about the implementation of np.ndarray.__setitem__ to guess at why the arr[:, mask] version leads to memory issues.