python-xarrayopendap

xarray MissingDimensionsError when reading remote dataset (NBM)


When reading a remote dataset of NBM (https://vlab.ncep.noaa.gov/web/mdl/nbm) I get a xarray.core.variable.MissingDimensionsError. I'm sure i'm missing some arg settings in the open_dataset.

You can see the structure of the data here: https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD.html. The full structure is shown here using ncdump -h https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD

vars which use time1:

import xarray as xr
url = "https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD"
ds = xr.open_dataset(url)

If you drop this variables it then goes to the next time dim

ds = xr.open_dataset(url, drop_variables="time1")
xarray.core.variable.MissingDimensionsError: 'time2' has more than 1-dimension and the same name as one of its dimensions ('reftime4', 'time2'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

full traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 575, in open_dataset
    ds = maybe_decode_store(store, chunks)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 471, in maybe_decode_store
    ds = conventions.decode_cf(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/conventions.py", line 600, in decode_cf
    ds = Dataset(vars, attrs=attrs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/dataset.py", line 630, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 467, in merge_data_and_coords
    return merge_core(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 594, in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py", line 278, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/variable.py", line 154, in as_variable
    raise MissingDimensionsError(
xarray.core.variable.MissingDimensionsError: 'time1' has more than 1-dimension and the same name as one of its dimensions ('reftime', 'time1'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

You can test locally

wget https://ftp.ncep.noaa.gov/data/nccf/com/blend/prod/blend.20210214/00/core/blend.t00z.core.f001.co.grib2

Solution

  • If you want to access these "TwoD" datasets from THREDDS Forecast Model Run Collection (FRMC) virtual datasets in Xarray, you can slice them first with the NetCDF library, then pass the sliced variable to Xarray. And if you wrap the NetCDF variable with Dask, you can keep things lazy.

    Here's an example of extracting a "best time series" for the last 60 values of the HRRR, but using 1-hour forecast data (instead of the default "analysis" 0-hour forecast you would with the FMRC Best Time Series):

    import netCDF4
    import xarray as xr
    from dask import array as da
    import hvplot.xarray
    
    url = 'https://thredds.unidata.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/TwoD'
    nc = netCDF4.Dataset(url)
    arr = da.from_array(nc['Temperature_height_above_ground'])
    tau = 1
    da = xr.DataArray(arr[-60:,tau,0,:,:], dims=['time','y','x'], name='temp')
    

    Here's the time series plot to prove it worked: enter image description here