pythoncsvnetcdfpython-xarraydimensions

Effective way to add "time" dimension to two dimensional (x, y) NetCDF file, which has variables reprecenting same variable at different times


I had CSV file with x and y coordinates, as well as variable values for three different time steps as follows:

x, y, var_t1, var_t2, var_t3,
1, 1, 8,      8,      6
1, 2, 6,      1,      2
2, 1, 5,      3,      7
2, 2, 7,      2,      6

I have learned to create a NetCDF file with the following method:

import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')

This results in a NetCDF with x and y as dimensions, and I get 3 different variables.

My goal was to create a NetCDF with x, y and t as dimensions with a single variable.

I managed to achieve this but I feel like I did it in a very complicated fashion.

My solution was to play with the CSV file and make it 3 times longer, while adding a "t" column to represent time steps:

x, y, t, var_t1, var_t2, var_t3,
1, 1, 0, 8,      0,      0
1, 2, 0, 6,      0,      0
2, 1, 0, 5,      0,      0
2, 2, 0, 7,      0,      0
1, 1, 1, 0,      8,      0
1, 2, 1, 0,      1,      0
2, 1, 1, 0,      3,      0
2, 2, 1, 0,      2,      0
1, 1, 2, 0,      0,      6
1, 2, 2, 0,      0,      2
2, 1, 2, 0,      0,      7
2, 2, 2, 0,      0,      6

Now when I apply

import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y', 't'])).to_netcdf('filename.nc')

I get a NetCDF with x, y, t dimensions and a single variable for each different time (i.e. when t = 1, only var_t2 != 0).

Would there be a way to achieve this in a much simpler way, in case I encounter a similar problem in the future? This was easy to do with only 3 time steps, but I would be in trouble with tens or thousands of time steps.

Thank you!


Solution

  • Say you have the dataframe df:

    >> df
    x  y  var_t1  var_t2  var_t3
    0  1  1       8       8       6
    1  1  2       6       1       2
    2  2  1       5       3       7
    3  2  2       7       2       6
    

    You can set x,y as an index, convert it to xarray, merge the variables var_t1... to a new dimension and set new_times as the coordinates of the time dimension:

    >> ds = df.set_index(["x", "y"]).to_xarray()
    >> ds
    <xarray.Dataset>
    Dimensions:  (x: 2, y: 2)
    Coordinates:
      * x        (x) int64 1 2
      * y        (y) int64 1 2
    Data variables:
        var_t1   (x, y) int64 8 6 5 7
        var_t2   (x, y) int64 8 1 3 2
        var_t3   (x, y) int64 6 2 7 6
    
    
    >> new_times = range(3)
    >> ds_result = ds.to_array(dim="time").assign_coords(time=new_times)
    >> ds_result
    <xarray.DataArray (time: 3, x: 2, y: 2)>
    array([[[8, 6],
            [5, 7]],
    
           [[8, 1],
            [3, 2]],
    
           [[6, 2],
            [7, 6]]])
    Coordinates:
      * x        (x) int64 1 2
      * y        (y) int64 1 2
      * time     (time) int64 0 1 2