python-xarraycfgribfsspec

xarray read remote grib file on s3 using cfgrib


Can the crgrib engine handle reading remote files? It doesn't look like it according to Martin Durant's comment (https://github.com/ecmwf/cfgrib/issues/198#issuecomment-772852412)

There is a smallish grib file hosted on s3: https://mf-nwp-models.s3.amazonaws.com/index.html#arpege-world/v2/2021-02-16/00/UGRD/10m/ (note don't click on a file as it'll download).

When I try to reading it use sf3s I get

import s3fs
import xarray as xr

fs = s3fs.S3FileSystem(anon=True)

uri = "s3://mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2"

file = s3fs.S3Map(uri, s3=fs)
ds = xr.open_dataset(file, engine="cfgrib")

Can't create file '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 342, in from_indexpath_or_filestream
    with compat_create_exclusive(indexpath) as new_index_file:
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 274, in compat_create_exclusive
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
FileNotFoundError: [Errno 2] No such file or directory: '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Can't read index file '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 352, in from_indexpath_or_filestream
    index_mtime = os.path.getmtime(indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/genericpath.py", line 55, in getmtime
    return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '<File-like object S3FileSystem, mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2>.90c91.idx'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 572, in open_dataset
    store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/cfgrib_.py", line 45, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 650, in open_file
    index = open_fileindex(path, grib_errors, indexpath, index_keys).subindex(filter_by_keys)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 637, in open_fileindex
    return stream.index(index_keys, indexpath=indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 269, in index
    return FileIndex.from_indexpath_or_filestream(self, index_keys, indexpath)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 370, in from_indexpath_or_filestream
    return cls.from_filestream(filestream, index_keys)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 297, in from_filestream
    for message in filestream:
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/messages.py", line 240, in __iter__
    with open(self.path, 'rb') as file:
TypeError: expected str, bytes or os.PathLike object, not S3File

Solution

  • Think I got it via https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally

    import fsspec
    import xarray as xr
    
    uri = "simplecache::s3://mf-nwp-models/arpege-world/v2/2021-02-16/00/UGRD/10m/0h.grib2"
    
    file = fsspec.open_local(uri, s3={'anon': True}, filecache={'cache_storage':'/tmp/files'})
    
    ds = xr.open_dataset(file, engine="cfgrib")