pythonpython-xarrayopendap

How to select specific data variables from xarray dataset


BACKGROUND

I am trying to download GFS weather data netcdf4 files via xarray & OPeNDAP. Big thanks to Vorticity0123 for their prior post, which allowed me to get the bones of the python script sorted (as below).

PROBLEM

Thing is, the GFS dataset has 195 data variables, But I don't require the majority, I only need ten of them.

HELP REQUESTED

I've gone through the xarray readthedocs page and elsewhere, but I couldn't figure out a way to narrow down my dataset to only the ten data variables. Does anyone know how to narrow down the list of variables in a dataset?

PYTHON SCRIPT

import numpy as np
import xarray as xr

# File Details
dt = '20201124'
res = 25
step = '1hr'
run = '{:02}'.format(18)

# URL
URL = f'http://nomads.ncep.noaa.gov:80/dods/gfs_0p{res}_{step}/gfs{dt}/gfs_0p{res}_{step}_{run}z'

# Load data
dataset = xr.open_dataset(URL)
time = dataset.variables['time']
lat = dataset.variables['lat'][:]
lon = dataset.variables['lon'][:]
lev = dataset.variables['lev'][:]

# Narrow Down Selection
time_toplot = time
lat_toplot = np.arange(-43, -17, 0.5)
lon_toplot = np.arange(135, 152, 0.5)
lev_toplot = np.array([1000])

# Select required data via xarray
dataset = dataset.sel(time=time_toplot, lon=lon_toplot, lat=lat_toplot)
print(dataset)

Solution

  • You can use the dict-like syntax of xarray.

    variables = [
        'ugrd100m',
        'vgrd100m',
        'dswrfsfc',
        'tcdcclm',
        'tcdcblcll',
        'tcdclcll',
        'tcdcmcll',
        'tcdchcll',
        'tmp2m',
        'gustsfc'
    ]
    
    
    dataset[variables]
    

    Gives you:

    <xarray.Dataset>
    Dimensions:    (lat: 721, lon: 1440, time: 121)
    Coordinates:
      * time       (time) datetime64[ns] 2020-11-24T18:00:00 ... 2020-11-29T18:00:00
      * lat        (lat) float64 -90.0 -89.75 -89.5 -89.25 ... 89.25 89.5 89.75 90.0
      * lon        (lon) float64 0.0 0.25 0.5 0.75 1.0 ... 359.0 359.2 359.5 359.8
    Data variables:
        ugrd100m   (time, lat, lon) float32 ...
        vgrd100m   (time, lat, lon) float32 ...
        dswrfsfc   (time, lat, lon) float32 ...
        tcdcclm    (time, lat, lon) float32 ...
        tcdcblcll  (time, lat, lon) float32 ...
        tcdclcll   (time, lat, lon) float32 ...
        tcdcmcll   (time, lat, lon) float32 ...
        tcdchcll   (time, lat, lon) float32 ...
        tmp2m      (time, lat, lon) float32 ...
        gustsfc    (time, lat, lon) float32 ...
    Attributes:
        title:        GFS 0.25 deg starting from 18Z24nov2020, downloaded Nov 24 ...
        Conventions:  COARDS\nGrADS
        dataType:     Grid
        history:      Sat Nov 28 05:52:44 GMT 2020 : imported by GrADS Data Serve...