I'm trying to read multiple netcdf files at once using xr.open_mfdataset
from a S3 bucket, using s3fs. Is this possible?
Tried the below, which works for xr.open_dataset
for a single file, but doesn't work for multiple files:
import s3fs
import xarray as xr
fs = s3fs.S3FileSystem(anon=False)
s3path = 's3://my-bucket/wind_data*'
store = s3fs.S3Map(root=s3path, s3=s3fs.S3FileSystem(), check=False)
data = xr.open_mfdataset(store, combine='by_coords')
I'm not sure exctly what S3Map
does; the documentation from s3fs isn't specific in this.
However, I was able to create a working implementation of this within a Jupyter environment using S3FileSystem.glob()
and S3FileSystem.open()
Here's a code sample:
import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem(anon=False)
# This generates a list of strings with filenames
s3path = 's3://your-bucket/your-folder/file_prefix*'
remote_files = s3.glob(s3path)
# Iterate through remote_files to create a fileset
fileset = [s3.open(file) for file in remote_files]
# This works
data = xr.open_mfdataset(fileset, combine='by_coords')