netcdfweatherncocdo-climate

How to remove seasonality from time series data?


How can I remove seasonality data from a timeseries with the data stored in a netcdf file? I would like to find a solution using Linux, while I used Grads and Ferret for visualization.

Thanks a lot!


Solution

  • You can use CDO to calculate the average for each day/month of the year and subtract from the origin file:

    If the file contains daily data:

    cdo ydaysub in.nc -ydaymean in.nc deseasonalized.nc  
    

    Likewise if the data is monthly:

    cdo ymonsub in.nc -ymonmean in.nc deseasonalized.nc  
    

    The ydaymean and ymonmean commands calculate the annual cycle over the dataset in.nc, i.e. ymonmean returns 12 time slices, the average of all the january, february and so on, which is then subtracted from the original file using sub. I've used piping, but it may be easier to understand on two separate lines:

    cdo ymonmean in.nc annual_cycle.nc
    cdo ymonsub in.nc annual_cycle.nc deseasonalized.nc
    

    This does exactly the same, deseasonalized.nc will be identical (well almost, there will be a few bytes differences due to the different "history" log in the netcdf global metadata header), but you will also have a new file with the annual_cycle.nc inside it (might also be useful?).

    Why use ymonsub or ydaysub instead of sub? Afterall, when doing a subtraction, CDO detects that the number of timeslices is smaller in the second file to be subtracted and thus loops/cycles through it. Well there are two reasons in fact.

    1. Safety regarding start dates with monthly timescales: as the seasonal cycle is calculated from the same file as the original data it is potentially fine to simply use sub as, if the data starts in e.g. April, the results of ymonmean will also start from April. However, if you want to remove a seasonal cycle calculated from a different source, the start day/month may be different and you end up subtracting e.g. April mean from January! To avoid this, you can use the ymonsub command instead.

    2. Leap years if using daily data: If you are instead working on a daily timescale, then you definitely need to use ydaysub, as just using sub ignores leapyears. Your mean seasonal cycle file has 366 time slices in it as it also has Feb 29. If you simply use sub, you are desyncing at a rate of 0.75 days per year on average! Thus if your series covers 40 years, you will be off by a month by the end of the series (oops!)

    postscript: there are now also packages in both R and python to allow you to access the full functionality of cdo from within those languages without having to resort to using shell access tools.

    Edit 2021: i now have a video on this topic you can view here https://youtu.be/jKlA1ouoQIs