pythonpython-xarraynetcdfnetcdf4

How to select from xarray.Dataset without hardcoding the name of the dimension?


When selecting data from an xarray.Dataset type, the examples they provide all include hardcoding the name of the dimension like so:

ds = ds.sel(state_name='California')

TLDR; How can you select from a dataset without hardcoding the dimension name? How would I achieve something like this since the below doesn't work?

dimName = 'state_name'
ds = ds.sel(dimName='California')

I have a situation where I won't know the name of the dimension to make my selection on until runtime of the application, but I can't figure out how to select the data with xarray's methods unless I know the dimension name ahead of time. For instance, let's say I have a dataset like this, where dim2, dim3, and dim4 all correspond to ID numbers of different spatial bounds that a user could select on a map:

import xarray as xr
import numpy as np

dim2 = ['12', '34', '56', '78']
dim3 = ['121', '341', '561', '781']
dim4 = ['1211', '3411', '5611', '7811']
time_mn = np.arange(1, 61)

ds1 = xr.Dataset(
    data_vars={
        'prcp_dim2': (['dim2', 'time_mn'], np.random.rand(len(dim2), len(time_mn))),
        'prcp_dim3': (['dim3', 'time_mn'], np.random.rand(len(dim3), len(time_mn))),
        'prcp_dim4': (['dim4', 'time_mn'], np.random.rand(len(dim4), len(time_mn))),
    },
    coords={
        'dim2': (['dim2'], dim2),
        'dim3': (['dim3'], dim3),
        'dim4': (['dim4'], dim4),
        'time_mn': (['time_mn'], time_mn)
    }
)

print(ds1)
<xarray.Dataset> Size: 6kB
Dimensions:    (dim2: 4, time_mn: 60, dim3: 4, dim4: 4)
Coordinates:
  * dim2       (dim2) <U2 32B '12' '34' '56' '78'
  * dim3       (dim3) <U3 48B '121' '341' '561' '781'
  * dim4       (dim4) <U4 64B '1211' '3411' '5611' '7811'
  * time_mn    (time_mn) int64 480B 1 2 3 4 5 6 7 8 ... 53 54 55 56 57 58 59 60
Data variables:
    prcp_dim2  (dim2, time_mn) float64 2kB 0.8804 0.2733 ... 0.3227 0.4637
    prcp_dim3  (dim3, time_mn) float64 2kB 0.1391 0.4541 ... 0.1688 0.3271
    prcp_dim4  (dim4, time_mn) float64 2kB 0.4784 0.6666 ... 0.3619 0.4864

Now let's say a a map is presented to a user and the user chooses ID 78 to calculate something from the dataset. From this ID, I can glean the dimension value 78 belongs to is dim2. How would I then make a selection on the xarray dataset where dim2=78 without hardcoding dim2 in?

selectedID = request.get('id') #This is the user's choice, let's say they chose '78'.

#Get the dimension name the selectedID belongs to
if len(selectedID) == 2:
  selectedDimension = 'dim2'
elif len(selectedID) == 3:
  selectedDimension = 'dim3'
elif len(selectedID) == 4:
  selectedDimension = 'dim4'

#This is what I want to be able to do, but it does not work
ds = ds.sel(selectedDimension=selectedID)

Is there a way to select the data without hardcoding the dimension name?

Edit: I do realize there is a solution like this, but that falls apart if say I wanted to put the above version of the if/else in a callable function because I could be reusing it elsewhere and I don't necessarily want to select the data when I call the function.

if len(selectedID) == 2:
  ds = ds.sel(dim2=selectedID)
elif len(selectedID) == 3:
  ds = ds.sel(dim3=selectedID)
elif len(selectedID) == 4:
  ds = ds.sel(dim4=selectedID)

Solution

  • This is a nice place to use Python dictionary unpacking.

    To get this:

    res = ds.sel(state_name='California')
    

    You can:

    dim_sel = {'state_name': 'California'}
    res = ds.sel(**dim_sel)
    

    And of course directly:

    res = ds(**{'state_name': 'California'}}
    

    Unpacking the dictionary with ** spreads the keys as argument names and the values as the argument values. This solution works anywhere in Python where you need to pass named arguments, it's not specific to xarray.

    Since you can just construct dictionaries on the fly with strings as key values, you are no longer stuck with using identifiers as parameter names.

    Your example where you select a dimension based on the length of some value would work out to:

    dim_lookup = {
        2: 'dim2',
        3: 'dim3',
        4: 'dim4'
    }
    res = ds.sel(**{dim_lookup[len(some_value)]: some_value})
    

    Note that this assumes there will be a key for every possible length of selectedID, but I'm sure you can see how to make this more robust. Also note that I assign to res instead of ds because I'm not sure you actually want to overwrite the original xarray reference with your selection.