pythonpython-xarrayweather

Efficient way to subtract yearly mean data from monthly data in Xarray?


Suppose I have the following Xarray dataarray:

>>> da
<xarray.DataArray 'precip' (time: 521, lat: 72, lon: 144)> Size: 22MB
[5401728 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
  * lon      (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
  * time     (time) datetime64[ns] 4kB 1979-01-01 1979-02-01 ... 2022-05-01

which contains monthly mean data from 1979 to 2022. I have calculated the yearly mean data from this data:

>>> yearly_mean=da.resample({'time':'YS'}).mean()
>>> print(yearly_mean)
<xarray.DataArray 'precip' (time: 44, lat: 72, lon: 144)> Size: 2MB                                                                      [25/1031]
array([[[ 0.5208333 ,  0.5141667 ,  0.51      , ...,  0.54833335,                                                                                 
          0.53833336,  0.5283333 ],                                                                                                               
        [ 0.4725    ,  0.4766667 ,  0.4883333 , ...,  0.48499998,                                                                                 
          0.47416666,  0.47166666],                                                                                                               
        [ 0.6399999 ,  0.68      ,  0.7308333 , ...,  0.5975    ,                                                                                 
          0.6016666 ,  0.61333334],                                                                                                               
        ...,                                                                                                                                      
        [ 0.05666666,  0.05833334,  0.05833334, ...,  0.05416666,                                                                                 
          0.05416666,  0.05083333],                                                                                                               
        [ 0.0725    ,  0.07166667,  0.07166667, ...,  0.07416666,                                                                                 
          0.0725    ,  0.06583334],                                                                                                               
        [ 0.11333334,  0.11166667,  0.11083334, ...,  0.11583333,                                                                                 
          0.115     ,  0.10166666]],                                                                                                              
                                                                                                                                                  
       [[ 0.5125    ,  0.50916666,  0.505     , ...,  0.53416663,                                                                                 
          0.525     ,  0.5183333 ],                                                                                                               
        [ 0.43249997,  0.43916664,  0.45000002, ...,  0.4358333 ,                                                                                 
          0.4308333 ,  0.42999998],                                                                                                               
        [ 0.5983333 ,  0.6266666 ,  0.6716667 , ...,  0.5758334 ,                                                                                 
          0.5783333 ,  0.5841667 ],                                                                                                               
...                                                                                                                                               
        [ 0.13250001,  0.10916666,  0.09833334, ...,  0.21333332,                                                                                 
          0.1875    ,  0.16      ],                                                                                                               
        [ 0.07666666,  0.07583333,  0.075     , ...,  0.0775    ,                                                                                 
          0.0775    ,  0.07666666],                                                                                                               
        [ 0.06333333,  0.06333333,  0.06166667, ...,  0.06416667,
          0.06333333,  0.06333333]],

       [[ 0.35200003,  0.34199998,  0.336     , ...,  0.38599998,
          0.374     ,  0.36200002],
        [ 0.42599997,  0.444     ,  0.45999998, ...,  0.394     ,
          0.40399998,  0.41399997],
        [ 0.49      ,  0.528     ,  0.582     , ...,  0.43800002,
          0.45      ,  0.468     ],
        ...,
        [ 1.8119999 ,  2.0379999 ,  2.35      , ...,  1.436     ,
          1.5239999 ,  1.6200001 ],
        [ 6.03      ,  6.708     ,  7.484     , ...,  4.4660006 ,
          4.9140005 ,  5.4300003 ],
        [13.785998  , 14.286     , 14.806     , ..., 12.434     ,
         12.855998  , 13.309999  ]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
  * lon      (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
  * time     (time) datetime64[ns] 352B 1979-01-01 1980-01-01 ... 2022-01-01

I want to calculate the difference between yearly mean and the monthly means. I could have just done da-yearly_data, if there were only a single year's data. Since I have data for multiple years, this would not work correctly. Xarray seems to successfully complete the operation but the results are incorrect, even the shape of the resulting dataarray is not what I was expecting:

>>> d=da-yearly_mean
>>> print(d)
<xarray.DataArray 'precip' (time: 44, lat: 72, lon: 144)> Size: 2MB
array([[[-3.10833335e-01, -3.04166734e-01, -3.00000012e-01, ...,                                                                                  
         -3.18333328e-01, -3.18333358e-01, -3.08333308e-01],                                                                                      
        [-3.52499992e-01, -3.46666694e-01, -3.38333309e-01, ...,                                                                                  
         -3.64999980e-01, -3.64166677e-01, -3.61666679e-01],                                                                                      
        [-4.19999927e-01, -4.30000007e-01, -4.30833280e-01, ...,                                                                                  
         -3.97500038e-01, -4.01666641e-01, -4.13333356e-01],                                                                                      
        ...,                                                                                                                                      
        [ 1.33333392e-02,  2.16666609e-02,  2.16666609e-02, ...,                                                                                  
          5.83333522e-03,  5.83333522e-03,  9.16666538e-03],                                                                                      
        [-1.24999993e-02, -1.16666667e-02, -1.16666667e-02, ...,                                                                                  
         -2.41666622e-02, -2.24999972e-02, -1.58333369e-02],                                                                                      
        [-3.33333388e-02, -4.16666716e-02, -4.08333391e-02, ...,                                                                                  
         -3.58333364e-02, -3.50000039e-02, -3.16666588e-02]],                                                                                     
                                                                                                                                                  
       [[-1.92499995e-01, -1.99166656e-01, -1.94999993e-01, ...,
         -1.84166640e-01, -1.84999973e-01, -1.88333303e-01],
        [-1.02499962e-01, -1.19166642e-01, -1.30000025e-01, ...,
         -5.58333099e-02, -6.08333051e-02, -7.99999833e-02],
        [ 1.66672468e-03, -3.66666317e-02, -7.16666579e-02, ...,
          6.41666055e-02,  6.16666675e-02,  3.58332992e-02],
...
         -6.33333176e-02, -7.75000006e-02, -6.99999928e-02],
        [-7.66666606e-02, -7.58333281e-02, -7.49999955e-02, ...,
         -7.75000006e-02, -7.75000006e-02, -7.66666606e-02],
        [-6.33333325e-02, -6.33333325e-02, -6.16666675e-02, ...,
         -6.41666725e-02, -6.33333325e-02, -6.33333325e-02]],

       [[-2.00003386e-03, -1.99997425e-03, -5.99998236e-03, ...,
          1.40000284e-02,  5.99998236e-03,  7.99998641e-03],
        [ 2.04000026e-01,  2.05999970e-01,  1.89999998e-01, ...,
          1.85999990e-01,  1.96000040e-01,  2.06000030e-01],
        [ 4.20000017e-01,  4.42000031e-01,  4.37999964e-01, ...,
          3.81999969e-01,  4.00000036e-01,  4.12000000e-01],
        ...,
        [ 6.08000159e-01,  1.19200015e+00,  1.97000027e+00, ...,
         -3.76000047e-01, -1.43999934e-01,  1.29999876e-01],
        [ 1.02699986e+01,  1.19820004e+01,  1.39359999e+01, ...,
          6.39399910e+00,  7.50599957e+00,  8.78999996e+00],
        [ 2.75340004e+01,  2.88440018e+01,  3.02239990e+01, ...,
          2.39559994e+01,  2.50839996e+01,  2.62800007e+01]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
  * lon      (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
  * time     (time) datetime64[ns] 352B 1979-01-01 1980-01-01 ... 2022-01-01

The shape of the output should be the same as da (monthly data).

The following is a crude method of doing what I want using a for-loop:

>>> years=yearly_mean.indexes['time'].year
>>> d=da.copy()
>>> for idx, year in enumerate(years):
...     d[idx*12:(idx*12)+12]=da[idx*12:(idx*12)+12]-yearly_mean[idx]
>>> print(d)
<xarray.DataArray 'precip' (time: 521, lat: 72, lon: 144)> Size: 22MB                                                                             
array([[[-3.108333e-01, -3.041667e-01, ..., -3.183334e-01, -3.083333e-01],                                                                        
        [-3.525000e-01, -3.466667e-01, ..., -3.641667e-01, -3.616667e-01],                                                                        
        ...,                                                                                                                                      
        [-1.250000e-02, -1.166667e-02, ..., -2.250000e-02, -1.583334e-02],                                                                        
        [-3.333334e-02, -4.166667e-02, ..., -3.500000e-02, -3.166666e-02]],                                                                       
                                                                                                                                                  
       [[-2.208333e-01, -2.141667e-01, ..., -2.283334e-01, -2.283333e-01],                                                                        
        [-2.125000e-01, -2.066667e-01, ..., -2.241667e-01, -2.216667e-01],                                                                        
        ...,                                                                                                                                      
        [-6.250000e-02, -6.166667e-02, ..., -6.250000e-02, -5.583334e-02],                                                                        
        [-1.033333e-01, -1.016667e-01, ..., -1.050000e-01, -9.166666e-02]],

       ...,

       [[-1.620000e-01, -1.620000e-01, ..., -1.840000e-01, -1.720000e-01],
        [-2.860000e-01, -2.940000e-01, ..., -2.740000e-01, -2.840000e-01],
        ...,
        [-2.080000e+00, -2.828000e+00, ..., -8.640003e-01, -1.430000e+00],
        [-1.093600e+01, -1.149600e+01, ..., -9.905998e+00, -1.041000e+01]],

       [[-2.000034e-03, -1.999974e-03, ...,  5.999982e-03, -2.000004e-03],
        [-1.360000e-01, -1.440000e-01, ..., -1.140000e-01, -1.240000e-01],
        ...,
        [-5.680000e+00, -6.368000e+00, ..., -4.564001e+00, -5.080000e+00],
        [-1.351600e+01, -1.401600e+01, ..., -1.258600e+01, -1.304000e+01]]],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 288B 88.75 86.25 83.75 81.25 ... -83.75 -86.25 -88.75
  * lon      (lon) float32 576B 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
  * time     (time) datetime64[ns] 4kB 1979-01-01 1979-02-01 ... 2022-05-01
Attributes:
    long_name:     Average Monthly Rate of Precipitation
    valid_range:   [ 0. 70.]
    units:         mm/day
    precision:     2
    var_desc:      Precipitation
    dataset:       CPC Merged Analysis of Precipitation Enhanced
    level_desc:    Surface
    statistic:     Mean
    parent_stat:   Mean
    actual_range:  [  0.   144.49]

That is, I want to do subtract from each of the monthly data in da with the corresponding year's annual mean which is in yearly_mean.

Is there a way to do it efficiently in Xarray, instead of using a for-loop?


Solution

  • You could try to use Dataset.reindex to bring your yearly data to the same shape as your monthly data. A possible solution could look like:

    ds_diff = da - yearly_mean.reindex(time=da.time, method="ffill")