encodingpython-xarrayzarr

How to prevent `to_zarr` method in xarray from writing all nan chunks to disk?


I want to save a very large zarr file (2 dimensional) chunked equally along both dimensions (X, X) occationally containing chunks made of all nans. To reduce the amount of chunks written to disk, I want xarray's to_zarr method to skip writing this chunk to disk at all.

Here is some code to emulate it:

import numpy as np
import xarray as xr

n = 100 # this could get as large as 400K, leaving it small for simplicity
n_chunk = 50 # chunk size
n_delete = 1 # number of random chunks to change to nans
lat = np.linspace(1, 2, n)
lon = np.linspace(1, 2, n)
data = np.random.random((n, n))

all_c = list()

for i in np.arange(n//n_chunk):
    for j in np.arange(n//n_chunk):
        all_c.append((i, j))
        
delete = np.array(all_c)[np.random.choice(np.arange(len(all_c)), n_delete)]
print(np.unique(delete, axis=1).shape)

for i, j in delete:
    j = j if j - 1 > 0 else 1
    i = i if i - 1 > 0 else 1
    data[(i - 1) * n_chunk:i * n_chunk, (j - 1) * n_chunk:j * n_chunk] = np.nan
    
xarr = xr.DataArray(data=data, name="test", dims=["lat", "lon"], coords=dict(lat=lat, lon=lon))
xarr = xarr.chunk((n_chunk, n_chunk))
xarr.to_dataset().to_zarr(r"C:/experiment.zarr", mode="w", encoding={"test": {"_FillValue": None}})

This would write all the chunks (in above case 4 chunks) to the disk (all nans is still a valid float). How can I stop it from writing the all nans chunk?


Solution

  • Xarray can utilize Zarr's write_empty_chunks option. You can add this to your variable encoding:

    ds.to_zarr(..., encoding={"test": {..., 'write_empty_chunks': False}})