python-xarrayzarrblosc

How to match all variables in xarray encoding (blosc, zarr compression)


The example of how to use zarr compression has the following code example see xarray doc:


In [42]: import zarr

In [43]: compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)

In [44]: ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
Out[44]: <xarray.backends.zarr.ZarrStore at 0x7f383eeba970>

The encoding mapping says to apply the given compressor on the "foo" variable. But What if I want to apply to all my variables, not matter how they are named. Would I have to explicitly create the encoding dictionary to match all variables in my Dataset/array or is there some kind of wild-card pattern? I just want to compress the whole Dataset with the same compressor.


Solution

  • If you want to set the same encoding for all of your variables, you can do that with a simple comprehension. When you iterate over a dataset, it'll return the variable names.

    Example:

    import xarray as xr
    import zarr
    
    # test dataset
    ds = xr.tutorial.open_dataset("tiny")
    
    # add second variable
    ds['tiny2'] = ds.tiny*2
    
    compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
    
    # encodings
    enc = {x: {"compressor": compressor} for x in ds}
    
    # check 
    print(enc)
    
    # {'tiny': {'compressor': Blosc(cname='zstd', clevel=3, shuffle=BITSHUFFLE, blocksize=0)}, 'tiny2': {'compressor': Blosc(cname='zstd', clevel=3, shuffle=BITSHUFFLE, blocksize=0)}}
    
    
    # x is the variable name
    ds.to_zarr("foo.zarr", encoding=enc})