opentsdbdownsamplingbosun

How do I use win.disk.duration in Bosun or how do I downsample a counter type metric?


I'm using Bosun to gather information about the average response time of a hard disk (win.disk.duration), it produces a graph such as:

enter image description here

I understand that this is showing the rate of change of the WMI reported value, AvgDiskSecPerRead.

What I want to do is downsample this calculated value taking the max over a time period. However if I set a downsample of Max with a window of 5m I get this:

enter image description here

Note the Y-axis scale change.

How can I downsample the rate to get the results I'm expecting (or why doesn't what i'm asking for make sense)?


Solution

  • Currently since this is a counter you can't downsample it correctly with OpenTSDB (the datasource I assume you are using due to the question tags). This is because OpenTSDB got the math wrong during design and that hasn't been fixed yet. More information in this Google groups thread. In short, the rate calculation (derivative) needs to be earlier in the order of operations.

    Also, be aware that currently on our side (scollector) we seem to have messed up a related metric win.disk.percent_time which still has to be looked into (tracked in this issue).