Now I have a timelonlat 3D data where time is recorded as year, month and day. I need to split time in the form of year*month+day. So that the data becomes 4 dimensional. How should I do this?
I have given a simple data below:
import xarray as xr
import numpy as np
import pandas as pd
time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
time = time[~((time.month == 2) & (time.day == 29))]
lon = np.linspace(100, 110, 5)
lat = np.linspace(30, 35, 4)
data = np.random.rand(len(time), len(lon), len(lat))
da = xr.DataArray(
data,
coords={"time": time, "lon": lon, "lat": lat},
dims=["time", "lon", "lat"],
name="pr"
)
except dim:
year: 2000, 2001
monthly: 01-01, 01-02,...12-31
lon: ...
lat: ...
One additional question:
Why is .first
and .last
reporting errors? How should I use them?
da.assign_coords(year = da.time.dt.year, monthday = da.time.dt.strftime("%m-%d")).groupby(['year', 'monthday']).first()
da.assign_coords(year = da.time.dt.year, monthday = da.time.dt.strftime("%m-%d")).groupby(['year', 'monthday']).last()
This is a solution for you:
import xarray as xr
import numpy as np
import pandas as pd
time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
time = time[~((time.month == 2) & (time.day == 29))]
lon = np.linspace(100, 110, 5)
lat = np.linspace(30, 35, 4)
data = np.random.rand(len(time), len(lon), len(lat))
da = xr.DataArray(
data,
coords={"time": time, "lon": lon, "lat": lat},
dims=["time", "lon", "lat"],
name="pr"
)
years = da.time.dt.year.values
month_day = da.time.dt.strftime('%m-%d').values
unique_years = np.unique(years)
unique_month_day = np.unique(month_day)
multi_index = pd.MultiIndex.from_arrays([years, month_day], names=('year', 'monthly'))
da_4d = da.copy()
da_4d.coords['time'] = multi_index
da_4d = da_4d.unstack('time')
print(da_4d)