pythondatasetpython-xarraygribcfgrib

Xarray (from grib file) to dataset


I have a grib file containing monthly precipitation and temperature from 1989 to 2018 (extracted from ERA5-Land).

I need to have those data in a dataset format with 6 column : longitude, latitude, ID of the cell/point in the grib file, date, temperature and precipitation.

I first imported the file using cfgrib. Here is what contains the xdata list after importation:

import cfgrib

grib_data = cfgrib.open_datasets('\era5land_extract.grib')

grib_data
Out[6]: 
[<xarray.Dataset>
 Dimensions:     (latitude: 781, longitude: 761, time: 372)
 Coordinates:
     number      int32 0
   * time        (time) datetime64[ns] 1989-01-01 1989-02-01 ... 2019-12-01
     step        timedelta64[ns] 1 days
     surface     float64 0.0
   * latitude    (latitude) float64 42.0 41.9 41.8 41.7 ... -35.8 -35.9 -36.0
   * longitude   (longitude) float64 -21.0 -20.9 -20.8 -20.7 ... 54.8 54.9 55.0
     valid_time  (time) datetime64[ns] ...
 Data variables:
     t2m         (time, latitude, longitude) float32 ...
 Attributes:
     GRIB_edition:            1
     GRIB_centre:             ecmf
     GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
     GRIB_subCentre:          0
     Conventions:             CF-1.7
     institution:             European Centre for Medium-Range Weather Forecasts,
 <xarray.Dataset>
 Dimensions:     (latitude: 781, longitude: 761, time: 156)
 Coordinates:
     number      int32 0
   * time        (time) datetime64[ns] 1989-01-01 1989-02-01 ... 2001-12-01
     step        timedelta64[ns] 1 days
     surface     float64 0.0
   * latitude    (latitude) float64 42.0 41.9 41.8 41.7 ... -35.8 -35.9 -36.0
   * longitude   (longitude) float64 -21.0 -20.9 -20.8 -20.7 ... 54.8 54.9 55.0
     valid_time  (time) datetime64[ns] ...
 Data variables:
     tp          (time, latitude, longitude) float32 ...
 Attributes:
     GRIB_edition:            1
     GRIB_centre:             ecmf
     GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
     GRIB_subCentre:          0
     Conventions:             CF-1.7
     institution:             European Centre for Medium-Range Weather Forecasts,
 <xarray.Dataset>
 Dimensions:     (latitude: 781, longitude: 761, time: 216)
 Coordinates:
     number      int32 0
   * time        (time) datetime64[ns] 2002-01-01 2002-02-01 ... 2019-12-01
     step        timedelta64[ns] 1 days
     surface     float64 0.0
   * latitude    (latitude) float64 42.0 41.9 41.8 41.7 ... -35.8 -35.9 -36.0
   * longitude   (longitude) float64 -21.0 -20.9 -20.8 -20.7 ... 54.8 54.9 55.0
     valid_time  (time) datetime64[ns] ...
 Data variables:
     tp          (time, latitude, longitude) float32 ...
 Attributes:
     GRIB_edition:            1
     GRIB_centre:             ecmf
     GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
     GRIB_subCentre:          0
     Conventions:             CF-1.7
     institution:             European Centre for Medium-Range Weather Forecasts]

So the temperature variable is called "t2m" and the precipitation variable "tp". Temperature variable is split in two xarrays but I don't understand why.

How can I obtain the needed dataset from this please ?

It's the first time I'm dealing with such data, and I'm really lost on how to proceed.


Solution

  • Here is the answer after a bit of trial and error (only putting the result for tp variable but it's similar for t2m)

    import cfgrib
    import xarray as xr
    
    
    # Import data
    grib_data = cfgrib.open_datasets('\era5land_extract.grib')
    
    
    # Merge both tp arrays into one on the time dimension
    grib_precip = xr.merge([grib_data[1], grib_data[2]])
    
    
    # Aggregate data by year
    grib_precip_year = grib_precip.resample(time="Y", skipna=True).mean()
    
    
    # Data from xarray to pandas
    grib_precip_pd = grib_precip_year.to_dataframe()