pythonnetcdfweathercdo-climateera5

Calculating ERA5 Daily Total Precipitation using CDO


Essentially, this is a repost of this question: https://confluence.ecmwf.int/pages/viewpage.action?pageId=149341027

I have downloaded ERA5 from the CDS. The input file has 24 hourly steps (0, 1, 2, 3, 4,..,23) for each calendar day starting from Jan 1 to Dec 31 of each considered year.

ECMWF state here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation that daily total precipitation must be calculated by accumulating precipitation for e.g. Jan 1, 1979 by summing the steps 1, 2,...,23 of Jan 1 AND step 0 of Jan 2. It means that the step 0 of Jan 1, 1979 is not included in calculation of the total precipitation for that day. For calculation of total precipitation for Jan 2, 1979 we use also the steps 1, 2, 3,...,23 of that day plus step 0 of Jan 3 and so on.

There seems to be an option doing this in python like this:

import xarray as xr                                                    # import xarray library
ds_nc = xr.open_dataset('name_of_your_file.nc')                        # read the file
daily_precipitation = ds_nc.tp.resample(time='24H').sum('time')*1000   # calculate sum with frequency of 24h and multiply by 1000
daily_precipitation.to_netcdf('daily_prec.nc')                         # save as netCDF

Now I am wondering whether this is also possible using the Climate Data Operators (CDO) in an easy way. Normally I would do any such calculation using the daysum command in CDO, but I'm not sure this is correct.

Someone had suggested to use:

cdo -f nc copy  out.nc aux.nc
cdo -delete,timestep=1, aux.nc aux1.nc
cdo -b 32 timselsum,24 aux1.nc aux2.nc
cdo -expr,'ppt=tp*1000' -setmissval,-9999.9 -remapbil,r240x120 aux2.nc era5_ppt_prev-0_1979-2018.nc

But I'm not sure this is correct - any suggestions?


Solution

  • For these kind of issues, the useful command in CDO is shifttime, which essentially does what is says on the can and shifts the time stamp.

    This kind of problem arises frequently with any kind of flux or accumulated field where the timestamp allocated to the data value points to the END of the time accumulation period, or "window", for example, with 3 hourly TRMM data the last three hours of the day have the stamp of 00 on the date afterwards, and functions such as daymean or daysum applied directly will calculate the average of 21 hours in one day and 3 hours from the previous day, incorrectly. Shifting the timestamp by three hours so the time points to the start of the window (or indeed by 1.5, pointing to the middle) before you perform the calculation will resolve this.

    So for your specific question where you have a long series of hourly data from ERA5 and you want the daily total, you can do:

    cdo shifttime,-1hour in.nc shift.nc # now step 0 on Jan 2 has Jan 1, 23:00 stamp 
    cdo daysum shift.nc daysum.nc 
    

    or piped together:

    cdo daysum -shifttime,-1hour in.nc daysum.nc
    

    EDIT: I have now uploaded a video guide that discusses this in more detail in case it is helpful to any readers of this post.

    (NOTE: This procedure is not the same to users of fluxes from the older ERA-Interim, where the fluxes are accumulated through the short forecast period. For ERA5 the "deaccumulation" is already done for you. With ERA-Interim you need to difference consecutive timesteps to convert from an accumulated field, and there is a post here that shows how to do this with CDO or python: converting an accumulated variable to timestep values in a netcdf file with CDO )