I have a rrd file with average ping times to a server (GAUGE) every minute and when the server is offline (which is very frequent for reasons that doesn't matter now) it stores a NaN/unknown.
I'd like to create a graph with the percentage the server is offline each hour which I think can be achieved by counting every NaN within 60 samples and then dividing by 60.
For now I get to the point where I define a variable that is 1 when the server is offline and 0 otherwise, but I already read the docs and don't know how to aggregate this:
DEF:avg=server.rrd:rtt:AVERAGE CDEF:offline=avg,UN,1,0,IF
Is it possible to do this when creating a graph? Or I will have to store that info in another rrd?
I don't think you can do exactly what you want, but you have a couple of options.
You can define a sliding window average, that shows the percentage of the previous hour that was unknown, and graph that, using TRENDNAN
.
DEF:avg=server.rrd:rtt:AVERAGE:step=60
CDEF:offline=avg,UN,100,0,IF
CDEF:pcavail=offline,3600,TREND
LINE:pcavail#ff0000:Availability
This defines avg
as the 1-min time series of ping data. Note we use step=60
to ensure we get the best resolution of data even in a smaller graph. Then we define offline
as 100 when the server is there, 0 when not. Then, pcavail
is a 1-hour sliding window average of this, which will in effect be the percentage of time during the previous hour during which the server was available.
However, there's a problem in that RRDTool will silently summarise the source data before you get your hands on it, if there are many data points to a pixel in the graph (this won't happen if doing a fetch
of course). To get around that, you'd need to have the offline
CDEF done at store time -- IE, have a COMPUTE
type DS that is 100 or 0 depending on if the avg
DS is known. Then, any averaging will preserve data (normal averaging omits the unknowns, or the xff
setting makes the whole cdp
unknown).
rrdtool create ...
DS:rtt:GAUGE:120:0:9999
DS:offline:COMPUTE:rtt,UN,100,0,IF
rrdtool graph ...
DEF:offline=server.rrd:offline:AVERAGE:step=3600
LINE:offline#ff0000:Availability
If you are able to modify your RRD, and do not need historical data, then use of a COMPUTE in this way will allow you to display your data in a 1-hour stepped graph as you wanted.