pythonnumpypython-xarray

Finding fully non-nan windows in DataArray


I have a three-dimensional DataArray and I would like to identify the indices where a 3x3 window in the last two dimensions does not contain any nans.

Consider toy data like so:

np.random.seed(1234)
arr = np.random.rand(20, 10, 10)
arr[arr < 0.1] = np.nan
arr = xr.DataArray(arr, dims=["time", "x", "y"])

My approach is to make a rolling dataset and reduce like below. Note that I cannot use arr.rolling(...).construct(...) because my actual data is too large.

result = arr.rolling(center=True, x=3, y=3)\
    .reduce(lambda x, axis: np.all(~np.isnan(x), axis=axis))

This seems like it works, but why is the output a mixture of nans and 1s instead of nans, 1s, and 0s?

> result.isel(time=0)

array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan,  1.,  1., nan, nan, nan, nan],
       [nan, nan, nan, nan,  1.,  1., nan, nan, nan, nan],
       [nan, nan, nan, nan,  1., nan, nan, nan,  1., nan],
       [nan, nan, nan, nan, nan, nan, nan, nan,  1., nan],
       [nan, nan, nan, nan, nan, nan, nan, nan,  1., nan],
       [nan,  1., nan, nan, nan, nan, nan, nan, nan, nan],
       [nan,  1.,  1.,  1., nan, nan, nan, nan, nan, nan],
       [nan,  1.,  1.,  1.,  1.,  1.,  1., nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])

I understand that the outer border is nan because a 3x3 window cannot be constructed there. But I don't understand why nans are showing up in the interior of the array as well. If a window has nan, shouldn't np.all return false and then cast that to zero?

I only care about the cells with 1s so this isn't a big deal. But, I'm curious to know why this is happening.


Solution

  • Tried a few approaches, and I settled on

    indices = np.where(
        array.rolling(...)
        .mean()
        .notnull()
    )
    

    This was able to handle the large array without using more than a few GB of RAM when the array is on disk. It used even less when the array is backed by dask. Credit goes to ThomasMGeo on the Pangeo forum. I suspect that calling .construct() isn't actually using that much memory, but a .stack() call I had in an earlier version was using a lot of memory.