I need a help regarding CDO operation on a netcdf file. I downloaded dataset for 40 years from ERA5 over a grid region and I masked variable values for a range (30-50) to 1 and other values to 0 using cdo.
cdo -expr,'var2=var*(var>=30 && var<50)' data1.nc data2.nc
Now I want to calculate number of times each grid cell recorded var2= 1 consecutively for 5 days but less than 10 days in the last 40 years. Is that possible using cdo or nco?
First of all, I'm assuming your input has been converted to daily, you don't say.
Then you also need to clarify the question. Your title originally said "how to count the number of days?", but that was a bit ambiguous?
Let's say you have a series like this that represents an 8 day event:
0 0 1 1 1 1 1 1 1 1 0 0 0 0
Does that count as a single occurrence? Your text seemed to imply this was the case, but the title not, I think you wanted to know the "number of events" not "days", so I edited your title to agree with the main text of the question, I hope this interpretation is correct.
I think you can do it but the solution is a bit longwinded. You can use runsum
to give you a "1" for any day which is 1 and is on the end of a series of N days like this:
cdo gec,N -runsum,N in.nc out5.nc
But that doesn't totally answer your question. For example if N=5 this would convert the above series into this:
0 0 0 0 0 0 1 1 1 1 0 0 0 0
i.e there are 4 days on the end of a 5 day series.
How can we get an upper limit to the length of an event? Well if we do the same calculation for >10 day events, and add together, we get
So we just add the two series and pick out the 1s to get the range of event lengths you require:
cdo gec,10 -runsum,10 in.nc out10.nc
# only keep events of 5,6,7,8 and 9 days in length:
cdo eqc,1 -add out5.nc out10.nc out5-10.nc
Okay now we have a file where var=1 when it is at the end of a series of at least 5 but less than ten days.
Now this is cool part, we can apply the same technique using runmean/runsum to pick up the START and END of each of these series, and then we can add up these events. If we apply a runsum
with a window size of 2, this produces 1 for a sequence of "0 1" or "1 0" i.e. it picks up the start and end points of each event.
cdo eqc,1 -runsum,2 out5-10.nc out_start_end.nc
This command turns our example series into the following, since we've seen only a sequence of "0 1" or "1 0" results in a 1:
0 0 0 0 0 0 1 0 0 1 0 0 0 0
Now we just need to sum this in time and divide by 2 (I told you it was long winded!)
cdo divc,2 -timsum out_start_end.nc number_of_events.nc
ta da!
Note 1 that if the whole input series ends mid-event e.g. 0 0 1 1 1 , this method will count this an a "half" event, since you only pick up the start. Round down to the nearest integer if this upsets you.
Putting this all together (and you can probably pipe to combine some of this), here is the whole solution involving 10 cdo commands summarized:
cdo gec,5 -runsum,5 in.nc out5.nc
cdo gec,10 -runsum,10 in.nc out10.nc
cdo eqc,1 -add out5.nc out10.nc out5-10.nc
cdo eqc,1 -runsum,2 out5-10.nc out_start_end.nc
cdo divc,2 -timsum out_start_end.nc number_of_events.nc
Note 2, the runsum
commands will use the window mid-point for the date/timestamp, but that is not important for this use-case. If anyone wants to also use the outN.nc files to see when the event days are, then it is usual to lag the time stamp using --timestat_date last
, see this video for more details.
Note 3 If you sum the series of days within the events, you can now divide this by the number of events to get the mean event length.