metpypython-siphon

Why does the xarray reftime key suddenly have a 1 at the end?


I'm downloading GFS data from the Unidata Thredds server using siphon so I can plot it using MetPy. I wrote a script to do this and it worked perfectly yesterday:

#Get data using siphon
best_gfs = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.xml?dataset=grib/NCEP/GFS/Global_0p25deg/Best')
best_ds = best_gfs.datasets[0]
ncss = best_ds.subset()
query = ncss.query()
query.lonlat_box(north=55, south=20, east=-60, west=-120).time(datetime.utcnow())
query.accept('netcdf4')
query.variables('Geopotential_height_isobaric')

data = ncss.get_data(query)

#Parse data using MetPy
ds = xr.open_dataset(NetCDF4DataStore(data))
data = ds.metpy.parse_cf()

time_of_run = data['reftime'][0].dt.strftime('%Y%m%d_%H%MZ').values
print(time_of_run)

When I ran it around 2 PM EDT, this code outputted 2020-03-29 12:00Z and all was well.

When I ran it this morning, I got an error:

Traceback (most recent call last):
  File "C:\Users\jacks\Anaconda3\envs\metpy_test\lib\site-packages\xarray\core\dataset.py", line 1155, in _construct_dataarray
    variable = self._variables[name]
KeyError: 'reftime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "h5_rh_wind_gph_temp.py", line 51, in <module>
    time_of_run = data['reftime'][0].dt.strftime('%Y%m%d_%H%MZ').values
  File "C:\Users\jacks\Anaconda3\envs\metpy_test\lib\site-packages\xarray\core\dataset.py", line 1245, in __getitem__
    return self._construct_dataarray(key)
  File "C:\Users\jacks\Anaconda3\envs\metpy_test\lib\site-packages\xarray\core\dataset.py", line 1158, in _construct_dataarray
    self._variables, name, self._level_coords, self.dims
  File "C:\Users\jacks\Anaconda3\envs\metpy_test\lib\site-packages\xarray\core\dataset.py", line 165, in _get_virtual_variable
    ref_var = variables[ref_name]
KeyError: 'reftime'

which suggests that the reference to the 'reftime' key is invalid. To investigate, I then printed the 'data' xarray:

<xarray.Dataset>
Dimensions:                       (isobaric6: 34, lat: 141, lon: 241, time1: 1)
Coordinates:
    reftime1                      (time1) datetime64[ns] ...
  * time1                         (time1) datetime64[ns] 2020-03-30T12:00:00
  * isobaric6                     (isobaric6) float32 40.0 100.0 ... 100000.0
  * lat                           (lat) float32 55.0 54.75 54.5 ... 20.25 20.0
  * lon                           (lon) float32 240.0 240.25 ... 299.75 300.0
    crs                           object Projection: latitude_longitude
Data variables:
    Geopotential_height_isobaric  (time1, isobaric6, lat, lon) float32 ...
    LatLon_Projection             int32 ...
Attributes:
    Originating_or_generating_Center:                                        ...
    Originating_or_generating_Subcenter:                                     ...
    GRIB_table_version:                                                      ...
    Type_of_generating_process:                                              ...
    Analysis_or_forecast_generating_process_identifier_defined_by_originating...
    Conventions:                                                             ...
    history:                                                                 ...
    featureType:                                                             ...
    History:                                                                 ...
    geospatial_lat_min:                                                      ...
    geospatial_lat_max:                                                      ...
    geospatial_lon_min:                                                      ...
    geospatial_lon_max:                                                      ...

which indicates that the information I want (the model's run time) is now stored as 'reftime1'. Why does this 1 suddenly appear at the end of the reftime key? Does its appearance happen with any regularity? I'm hoping to eventually run this script as a cron job to automatically generate plots so it would be nice to figure out a way to either anticipate this change in name or to circumvent the keyname entirely.


Solution

  • The change from reftime to reftime1 comes from how THREDDS and netCDF-java handle making a netCDF representation of the underlying GRIB data. GRIB fundamentally arrive as individual 2D slices of data. To create the time, reftime, and various vertical dimensions, netCDF-java is looking at the set of GRIB messages available for a given field (e.g. Geopotential_height_isobaric). If fields have different sets of times/vertical dimensions, then separate dimensions are created with unique names e.g. reftime, reftime1, reftime2. Which field ends up with which dimension name depends on the order in which netCDF-java encounters a particular GRIB message in the collection.

    The way to work with this is to avoid depending on the names, but instead use the metadata to figure out what you're looking for. MetPy can do some of this by giving you appropriate aliases for the various dimensions/coordinates:

    # Will point to appropriate time1, time2, etc.
    time = data.Geopotential_height_isobaric.metpy.time
    

    That works for coordinates for a particular variable. In the case of reftime, since that's not a coordinate for the variable, you can also look that up by looking for the Climate and Forecasting (CF) metadata standard name of forecast_reference_time:

    filtered_ds = data_.filter_by_attrs(standard_name='forecast_reference_time')
    

    That still leaves an xarray Dataset that you need to find some way to pull out the only variable inside--I'm not sure what the best way to go down from there is.