pythonpython-xarrayzarr

How can I initialize a Zarr file that is larger than available memory?


My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'

Solution

  • I was able to do with `dask.array`.

    import dask.array as da
    import numpy as np
    
    coords = ...
    dims = ...
    var_name = 'value'
    chunks = (1, 13, 36, 128, 128)
    encoding = {var_name: {'chunks': chunks}}
    store = 'test.zarr'
    
    daskarray = da.empty(
        (6, 13, 36, 699, 1920),
        chunks=chunks,
        dtype='float32',
    )
    daskarray[:] = np.nan
    
    xr.DataArray(
        daskarray,
        coords=coords,
        dims=dims,
    ).to_dataset(name=var_name).to_zarr(store, mode='w', encoding=encoding)