I had CSV file with x and y coordinates, as well as variable values for three different time steps as follows:
x, y, var_t1, var_t2, var_t3,
1, 1, 8, 8, 6
1, 2, 6, 1, 2
2, 1, 5, 3, 7
2, 2, 7, 2, 6
I have learned to create a NetCDF file with the following method:
import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y'])).to_netcdf('filename.nc')
This results in a NetCDF with x and y as dimensions, and I get 3 different variables.
My goal was to create a NetCDF with x, y and t as dimensions with a single variable.
I managed to achieve this but I feel like I did it in a very complicated fashion.
My solution was to play with the CSV file and make it 3 times longer, while adding a "t" column to represent time steps:
x, y, t, var_t1, var_t2, var_t3,
1, 1, 0, 8, 0, 0
1, 2, 0, 6, 0, 0
2, 1, 0, 5, 0, 0
2, 2, 0, 7, 0, 0
1, 1, 1, 0, 8, 0
1, 2, 1, 0, 1, 0
2, 1, 1, 0, 3, 0
2, 2, 1, 0, 2, 0
1, 1, 2, 0, 0, 6
1, 2, 2, 0, 0, 2
2, 1, 2, 0, 0, 7
2, 2, 2, 0, 0, 6
Now when I apply
import xarray as xr
xr.Dataset.from_dataframe(df.set_index(['x', 'y', 't'])).to_netcdf('filename.nc')
I get a NetCDF with x, y, t dimensions and a single variable for each different time (i.e. when t = 1, only var_t2 != 0).
Would there be a way to achieve this in a much simpler way, in case I encounter a similar problem in the future? This was easy to do with only 3 time steps, but I would be in trouble with tens or thousands of time steps.
Thank you!
Say you have the dataframe df
:
>> df
x y var_t1 var_t2 var_t3
0 1 1 8 8 6
1 1 2 6 1 2
2 2 1 5 3 7
3 2 2 7 2 6
You can set x,y as an index, convert it to xarray, merge the variables var_t1... to a new dimension and set new_times
as the coordinates of the time dimension:
>> ds = df.set_index(["x", "y"]).to_xarray()
>> ds
<xarray.Dataset>
Dimensions: (x: 2, y: 2)
Coordinates:
* x (x) int64 1 2
* y (y) int64 1 2
Data variables:
var_t1 (x, y) int64 8 6 5 7
var_t2 (x, y) int64 8 1 3 2
var_t3 (x, y) int64 6 2 7 6
>> new_times = range(3)
>> ds_result = ds.to_array(dim="time").assign_coords(time=new_times)
>> ds_result
<xarray.DataArray (time: 3, x: 2, y: 2)>
array([[[8, 6],
[5, 7]],
[[8, 1],
[3, 2]],
[[6, 2],
[7, 6]]])
Coordinates:
* x (x) int64 1 2
* y (y) int64 1 2
* time (time) int64 0 1 2