I have a gauge metric badness
which goes up when my service is performing poorly. There is one gauge per instance of the service and I have many instances.
I can take a max over all instances so that I can see how bad the worst instance is:
max(badness)
This graph is noisy because the identity of the worst instance, and how bad it is, changes frequently. I would like to smooth it out by applying a moving average. However, this doesn't work (I get a PromQL syntax error):
avg_over_time(max(badness)[1m])
How can I apply avg_over_time()
to a timeseries that has already been aggregated with the max()
operator?
My backend is VictoriaMetrics so I can use either MetricsQL or pure PromQL.
The avg_over_time(max(process_resident_memory_bytes)[5m])
query works without issues in VictoriaMetrics. It may fail if you use promxy in front of VictoriaMetrics, since promxy
doesn't support MetricsQL - see this issue for details.
The query can be fixed, so it may work in Prometheus and promxy - just add a colon after 5m
in square brackets:
avg_over_time(max(process_resident_memory_bytes)[5m:])
This is named subquery in Prometheus world. See mode details about subquery specifics in VictoriaMetrics and Prometheus in this article