graphitewhisper

whisper aggregation not working for older data points


carbon storage scheme

[default]  
pattern = .*  
retentions = 5m:15d,15m:1y,1h:10y,1d:100y

storage-aggregation :

[all_sum]  
pattern = .*  
xFilesFactor = 0.1  
aggregationMethod = sum  

Now, I am feeding entries as :

echo "rec.test 25 $(date --date="-6 minute" +%s)" | nc localhost 2003  
echo "rec.test 50 $(date --date="-3 minute" +%s)" | nc localhost 2003  
echo "rec.test 100 $(date +%s)" | nc localhost 2003  
echo "rec.test 1 $(date --date="-1 year" +%s)" | nc localhost 2003  
echo "rec.test 4 $(date --date="-1 year minute" +%s)" | nc localhost 2003  
echo "rec.test 6 $(date --date="-1 year -1 minute" +%s)" | nc localhost 2003  
echo "rec.test 8 $(date --date="-1 year -2 minute" +%s)" | nc localhost 2003  

On grafana graph, I am able to see the aggregation(sum value) for recent feeded values. But 1 year before values are not aggregated. In fact only one value(latest entry from window of 1 hour) 8 is shown instead of 4+6+8=18.

What can be missing in the configurations ?


Solution

  • There is a buffer mechanism in carbon-aggregator that stores values received during the finest retention period and emits the aggregated value.

    In your example, 5m:15d means that the buffer will store all points received in the last 5 minutes and frequently emit their sum for carbon-cache (which will write into whisper file).

    That explains the normal workflow of points in graphite.

    Example:

      Metrics received:
      hello.world 42  1427615689 (15 minutes ago)
      hello.world 1   1427615869 (12 minutes ago)
      hello.world 1   1427615929 (11 minutes ago)
      hello.world 314 1427616049 (9 minutes ago)
      hello.world 1   1427616051(~9 minutes ago)
    

    will write 2 points in whisper file:

    1427615689 44 (42+1+1)
    1427615989 315 (314+1)
    

    However, a buffer is dropped when the the first point of the buffer is older than a given threshold.

    The threshold is computed in a way to allow late points to be aggregated (if points come a few seconds after the normal windows of 5minuters) but this has to stop somewhere (otherwise all points should be stored in carbon-aggregator's memory for ever). This theshold resolution * settings['MAX_AGGREGATION_INTERVALS'] where MAX_AGGREGATION_INTERVALS defaults to 5.

    In your case, all points received 25 minutes after the timestamp they carry will find a deleted buffer. In this case graphite will create a new buffer and emit "the aggregated" value to whisper, overwriting the correct value.

    In the previous example, if you send a point:

    hello.world 100  1427615690 (~15 minutes ago)
    

    25 minutes after the time of emission, it will overwrite whisper. You'll get:

    1427615689 100 (100)
    1427615989 315 (314+1)
    

    Late points are a corner case of grahite buffer design (and most time series databases). If you know that some points can come late you can try to increase the MAX_AGGREGATION_INTERVALS setting but I would recommend to store them elsewhere first and reconcialiate them offline with what is stored in graphite.