python-xarrayzarrfsspec

open_mfdataset() on remote zarr store giving zarr.errors.GroupNotFoundError


I'm looking to read a remote zarr store using xarray.open_mfdataset()

I'm getting a zarr.errors.GroupNotFoundError: group not found at path ''. Traceback at the bottom.

import xarray as xr
import s3fs

fs = s3fs.S3FileSystem(anon=True)
uri = "s3://era5-pds/zarr/2020/12/data/eastward_wind_at_10_metres.zarr"
file = s3fs.S3Map(uri, s3=fs)
ds = xr.open_mfdataset(file, engine="zarr")

I'm able to open it using xr.open_zarr

ds = xr.open_zarr(file)

If I were to download the zarr store locally it works fine

import xarray as xr
import s3fs
fs = s3fs.S3FileSystem(anon=True)
fs.get("s3://era5-pds/zarr/2020/12/data/eastward_wind_at_10_metres.zarr/*", "eastward_wind_at_10_metres.zarr", recursive=True)
ds = xr.open_mfdataset("eastward_wind_at_10_metres.zarr", engine="zarr")

Trackback for open_mfdataset on remote zarr store

>>> ds = xr.open_mfdataset(file, engine="zarr")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 948, in open_mfdataset
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 948, in <listcomp>
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 572, in open_dataset
    store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/zarr.py", line 296, in open_group
    zarr_group = zarr.open_group(store, **open_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/zarr/hierarchy.py", line 1166, in open_group
    raise GroupNotFoundError(path)
zarr.errors.GroupNotFoundError: group not found at path ''

Solution

  • I think open_mfdataset is doing some sort of glob on your path, because it expects multiple paths to handle (because this is multi-file).

    The following do work (the former maybe only on master)

    ds = xr.open_mfdataset(uri, engine="zarr", backend_kwargs=dict(storage_options={'anon': True}))
    
    ds = xr.open_mfdataset([file], engine="zarr")