pythonpython-xarraynetcdf

Xarray gives wrong shape of result when adding two netcdf files of equal dimensions


I have a multiple NetCDF datasets containing a single variable called t2m, each with three dimensions - (longitude: 38, latitude: 35, time: 1). Using xarray, I want to add the datasets to obtain the sum of the variable for each geographic cell. Below is the information on just 2 datasets, i_january90 and i_february89

i_january90 info

i_february89 info

However, when I add the datasets, the resulting output; I, has dimensions of (longitude: 38, latitude: 35, time: 0).

I info

Below is the sample code for just 2 of the files which gives the same result:

import xarray as xr
import numpy as np

cd (Path to my drive working folder)

i_january90 = xr.open_dataset("i_january90.nc")
i_february89 = xr.open_dataset("i_february89.nc")

#Add the 2 files
I = i_january90+i_february89

Both files have the same dimensions. The spatial extent of the files are also the same. I confirmed there are missing data values in each (nan) and tried the below code to add them, but the dimensions in the result were still longitude: 38, latitude: 35, time: 0.

I = xr.where(i_january90.notnull() & i_february89.notnull(), i_january90+i_february89, np.nan)

Would appreciate any ideas on what the problem and possible solution could be. Thanks


Solution

  • If you look at the time coordinate of your two files, I'd guess that the times are different (one would reflect that it's a Jan file and the other that it's a Feb file). When you add, xarray will only values if all the coordinates match. This is often useful because you normally don't want to add mismatched points, but in this case it's causing a problem. Think about it this way - what should the time coordinate of the output be? It's unclear because it could be Jan or Feb.

    For this case, one work-around would be to drop the time dimension and then add.

    jan_noTime = i_january90.isel({'time':0}).drop('time') 
    feb_noTime = i_february89.isel({'time':0}).drop('time') 
    janfeb_sum = jan_noTime + feb_noTime
    

    I hope that answers the question.