modelnetcdfcdo-climatenco

How to create two separate sets of data (one for daylight hours and another for nighttime hours) from hourly netcdf model output using CDO


I have hourly surface temperature data from the RegCM model output. It is 3D data with a size of 303×185×1176 (lon×lat×time). The time dimension is in hours, so 1176 hours means a total of 49 days of simulation. The model hours start at 00:00 UTC on the first simulation day and end at 00:00 UTC on the last simulation day. Now, I want to create two separate sets of data from the existing data. One file will be for only daylight hours (6.30 AM–6.30 PM), and another file will be for only nighttime hours (6.30 PM–5.30 AM). Note that as the timestep of the model output is in UTC, I have to add 5.30 to the model hours to get the IST (Indian standard time).

Thus, in my case, for the first day, daylight hours start from the timesteps 1 to 13, and nighttime hours start from the timesteps 13 to 24, and so on. So, can anyone please guide me on how to extract the data for daylight and nighttime hours and save it to two separate netcdf file ('data_day.nc' and 'data_night.nc') in CDO? I am very new to CDO, so any kind of help will be highly appreciated. Thank you for your time and consideration.


Solution

  • The general command is

    cdo selhour,7/19 in.nc day.nc     
    cdo selhour,0/6,20/23 in.nc night.nc 
    

    The issue is that you want local time, whereas the hour is UTC. If the domain is small then you can manually work out what the local time offset is, for example, you want IST, so you could manually calculate what the time bounds should be adjusted to IST.

    In general, it is much easier is to do this automatically, by making a new local time variable based on your domain longitude. Note, this is a "true" solar-based local time and will not necessarily be the same as the nationally used local time, which as you can see from the following map taken from Wikipedia, tend to deviate east and west quite a bit for practical expediency and political motivation.

    National Time Zones from Wikipedia

    In theory this is straightforward to do as you have the longitude dimension in any gridded netcdf file. The earth rotates 360 degrees longitude in 24 hours so each degree east we move the local time shifts by 24/360 hours (=4 minutes in time). In the following I resort to using the more flexible NCO, since CDO doesn't allow you to apply functions to dimensions in general, only variables.

    One last thing to highlight, in the follow we subtract the longitude offset as we want to project everything to UTC. For example, Italy is one hour ahead of UTC in winter, which means 7am is actually 6am UTC, which is the timestamp in the file.

    SOLUTION 1 - simple method for mean offset

    In the following I assume your time dimension is called "time" (hopefully as it is CF compliance) and the units of time are "hours since", if they are seconds since, then of course you need to adjust the factor by 3600 accordingly. I also assume your longitude is a 1D vector called "lon" (check with ncdump -h):

    This then takes the average of the longitude across the domain, and then converts it into an "hour offset" which I subtract from the time variable:

    ncap2 -O -s "time=time-lon.avg($lon)*24/360" in.nc shifted.nc 
    

    You can then apply the commands above, but I would repeat the "bounds" to ensure you don't miss out steps (you may double count a step if it is exactly 7/19, but I presume that is not a problem:

    cdo selhour,7/19 shifted.nc day.nc     
    cdo selhour,0/7,19/23 shifted.nc night.nc 
    

    CAVEATS with simple approach:

    1. If the domain is very "wide", i.e. crosses several timezones, then your cut will not be exactly local 7-19 of course, as this is the average across the domain.
    2. If you have a rotated grid or another projection, the average longitude will not be precise and this will only be approximate.

    SOLUTION 2: creating a full 3D local hour array and using that to mask

    We can get around the two caveats above by taking a different approach, whereby we use the power of ncap2 to create a 3D variable which stores the local hour at each gridpoint. This time though, we retain all timesteps when we split the file into two, as we need to use the mask function (as the cut is different at each location, by definition).

    # first make a 3D local hour variable, % is the modulus func:
    ncap2 -O -s "localhr[time,lat,lon]=(time-lon*24/360) % 24" in.nc lochr.nc
    
    # now use this to mask the data for variable called "var" e.g.
    # note, this also avoids the double counting issue :-)
    
    ncap2 -s 'where(localhr <= 7 || localhr >= 19) var=var.get_miss()' lochr.nc day.nc
    ncap2 -s 'where(localhr > 7 && localhr < 19) var=var.get_miss()' lochr.nc night.nc
    

    This is much more precise as it cuts exactly on the local hour for every point, and thus can work with other grid projections etc.

    If you have several variables, I think you can loop over them like this (untested):

    for var in t2m precip ; do 
        ncap2 -s "where(localhr <= 7 || localhr >= 19) ${var}=${var}.get_miss()" lochr.nc day.nc
        ncap2 -s "where(localhr > 7 && localhr < 19) ${var}=${var}.get_miss()" lochr.nc night.nc
    done