I need to write a query that looks for sudden drops in traffic, so I am attempting to get the current 10 minutes and compare it to the previous few 10 minute buckets.
Here I get 6 values of total traffic for each 10 minutes over past 1 hour. Seems easy enough:
sum(
increase(
istio_requests_total{destination_workload="service-v1", reporter="source"}[10m:10m]
)
)
This yields the following result. Notice I only get 6 values with no interpolation, exactly what I want.
Time Value
2023-08-25 14:50:00 5
2023-08-25 15:00:00 7
2023-08-25 15:10:00 8
2023-08-25 15:20:00 6
2023-08-25 15:30:00 9
2023-08-25 15:40:00 5
To get the current 10 minutes only, I tried:
last_over_time(
sum(
increase(
istio_requests_total{destination_workload="service-v1", reporter="source"}[10m:10m]
)
)[10m:10m]
)
But bizarrely, the interpolation comes back and I get the default interval of 5s:
Time Value
2023-08-25 14:47:10 5
2023-08-25 14:47:15 5
2023-08-25 14:47:20 5
[ ... as many 5 seconds as there are in an hour ]
I am expecting no interpolation like the first example and just 1 value:
Time Value
2023-08-25 15:40:00 5
The reason I cannot just change the overall time frame to just 10 minutes is because I need those other values to get the avg, min, and max of all other buckets except the first to build the baseline. Then I can compare the current 10 minute bucket with the average of each 10 minute bucket of the past hour to detect sudden changes, and to make sure if the current 10 min is lower than the minimum 10 minute value so the alert will self-throttle unless it gets worse.
This is not going in Grafana so any Grafana solutions unfortunately won't work.
How Can i get <agg>_over_time() functions to honor the specified resolution of 10m and not interpolate the value throughout every 5s interval?
It seems like you have a misconception of resolution used in range selectors. If is used to return range vector it's in fact behave like you expect, but it's rare case, and this behavior doesn't extend to result of functions using range vectors as input.
To get only one result per every 10 minutes you need to pass appropriate step
parameter to your request (in this case it will be step=10m
).
increase(something [10m:10m])
generally isn't expected to produce anything, because increase
needs at least 2 points to calculate result. On the other hand something like increase(something [10m:5m])
is expected to produce result for every step. So the fact that you got 6 points with your initial query is a lucky coincidence.
Here is demo of different approaches. Notice that first two graphs have value every 14 seconds (default value). And the third one - only once every 600 seconds. The last one doesn't produce anything (maybe if you are lucky and press "execute" in exactly round 10m time it will? I'm not sure on that).
I believe you also have a wrong conception of last_over_time
. It is used to "stretch" last seen value over periods of absence. Demo of what it does here.
I'm not sure what you trying to do in your case, as it's not clear your how you intend to use in later, but generally to receive last value (relatively passed time
parameter, you don't need to do anything special: just pass your query. (Or maybe you wanted to use something like @end()
?)
Regarding
compare the current 10 minute bucket with the average of each 10 minute bucket of the past hour to detect sudden changes, and to make sure if the current 10 min is lower than the minimum 10 minute value
If all you want to find if increase of counter over current 10 minutes are lower than increases over 5 previous 10 minutes windows, you can simply use following query:
increase(my_counter[10m])
< increase(my_counter[10m]) offset 10m
< increase(my_counter[10m]) offset 20m
< increase(my_counter[10m]) offset 30m
< increase(my_counter[10m]) offset 40m
< increase(my_counter[10m]) offset 50m
EDIT: to accommodate rule "return value if current 10 minutes increase is less then 50% of average increase over previous 50 minutes" you can use query like
increase(my_counter[10m]) < increase(my_counter[50m] offset 10m) / 5 * 0.5
Here:
increase(my_counter[10m])
- increase over "current" 10 minutes,increase(my_counter[50m] offset 10m)
- increase over previous 50 minutes. Equivalent to increase(my_counter[10m]) offset 10m + increase(my_counter[10m]) offset 20m + increase(my_counter[10m]) offset 30m + increase(my_counter[10m]) offset 40m + increase(my_counter[10m]) offset 50m
,/ 5
since right-hand side is 5 times longer - divide by 5 to equate time windows,* 0.5
- your threshold of 50%.