linuxrrdtool

How to aggregate data by period in a rrdtool graph


I have a rrd file with average ping times to a server (GAUGE) every minute and when the server is offline (which is very frequent for reasons that doesn't matter now) it stores a NaN/unknown.

I'd like to create a graph with the percentage the server is offline each hour which I think can be achieved by counting every NaN within 60 samples and then dividing by 60.

For now I get to the point where I define a variable that is 1 when the server is offline and 0 otherwise, but I already read the docs and don't know how to aggregate this:

DEF:avg=server.rrd:rtt:AVERAGE CDEF:offline=avg,UN,1,0,IF

Is it possible to do this when creating a graph? Or I will have to store that info in another rrd?


Solution

  • I don't think you can do exactly what you want, but you have a couple of options.

    You can define a sliding window average, that shows the percentage of the previous hour that was unknown, and graph that, using TRENDNAN.

    DEF:avg=server.rrd:rtt:AVERAGE:step=60
    CDEF:offline=avg,UN,100,0,IF
    CDEF:pcavail=offline,3600,TREND
    LINE:pcavail#ff0000:Availability
    

    This defines avg as the 1-min time series of ping data. Note we use step=60 to ensure we get the best resolution of data even in a smaller graph. Then we define offline as 100 when the server is there, 0 when not. Then, pcavail is a 1-hour sliding window average of this, which will in effect be the percentage of time during the previous hour during which the server was available.

    However, there's a problem in that RRDTool will silently summarise the source data before you get your hands on it, if there are many data points to a pixel in the graph (this won't happen if doing a fetch of course). To get around that, you'd need to have the offline CDEF done at store time -- IE, have a COMPUTE type DS that is 100 or 0 depending on if the avg DS is known. Then, any averaging will preserve data (normal averaging omits the unknowns, or the xff setting makes the whole cdp unknown).

    rrdtool create ...
    DS:rtt:GAUGE:120:0:9999
    DS:offline:COMPUTE:rtt,UN,100,0,IF
    
    rrdtool graph ...
    DEF:offline=server.rrd:offline:AVERAGE:step=3600
    LINE:offline#ff0000:Availability
    

    If you are able to modify your RRD, and do not need historical data, then use of a COMPUTE in this way will allow you to display your data in a 1-hour stepped graph as you wanted.