pythonpandaspython-xarray

Splitting the time dimension of nc data using xarray


Now I have a timelonlat 3D data where time is recorded as year, month and day. I need to split time in the form of year*month+day. So that the data becomes 4 dimensional. How should I do this?

I have given a simple data below:

import xarray as xr
import numpy as np
import pandas as pd

time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
time = time[~((time.month == 2) & (time.day == 29))] 

lon = np.linspace(100, 110, 5)
lat = np.linspace(30, 35, 4)
data = np.random.rand(len(time), len(lon), len(lat))

da = xr.DataArray(
    data,
    coords={"time": time, "lon": lon, "lat": lat},
    dims=["time", "lon", "lat"],
    name="pr"
)

except dim:

year: 2000, 2001

monthly: 01-01, 01-02,...12-31

lon: ...

lat: ...


One additional question: Why is .first and .last reporting errors? How should I use them?

da.assign_coords(year = da.time.dt.year, monthday = da.time.dt.strftime("%m-%d")).groupby(['year', 'monthday']).first()
da.assign_coords(year = da.time.dt.year, monthday = da.time.dt.strftime("%m-%d")).groupby(['year', 'monthday']).last()

Solution

  • This is a solution for you:

    import xarray as xr
    import numpy as np
    import pandas as pd
    
    time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
    time = time[~((time.month == 2) & (time.day == 29))] 
    
    lon = np.linspace(100, 110, 5)
    lat = np.linspace(30, 35, 4)
    data = np.random.rand(len(time), len(lon), len(lat))
    
    da = xr.DataArray(
        data,
        coords={"time": time, "lon": lon, "lat": lat},
        dims=["time", "lon", "lat"],
        name="pr"
    )
    
    years = da.time.dt.year.values
    month_day = da.time.dt.strftime('%m-%d').values
    
    unique_years = np.unique(years)
    unique_month_day = np.unique(month_day)
    
    multi_index = pd.MultiIndex.from_arrays([years, month_day], names=('year', 'monthly'))
    
    da_4d = da.copy()
    da_4d.coords['time'] = multi_index
    da_4d = da_4d.unstack('time')
    
    print(da_4d)