I am getting confused about how the subquery notation in prometheus (like [5m:1m]
) is aggregating data. I know when I mention range selector as this [5m]
, it aggregates every data point in each 5 minute window. But when I plot graphs for [5m:1m]
, it shows entire different set of data and I do not know how it got there. The queries I am using are
sum_over_time(metric_name{labelName="label"}[5m])
sum_over_time(metric_name{labelName="label"}[5m:1m])
How these 2 are different and how do they calculate data?
Subqueries are an inline substitution for recording rules. They behave very similarly, but without actually producing stored metric.
So expression like sum_over_time( (<some_query>) [5m:1m])
is equivalent to sum_over_time(instance:some_query_result:resolution[5m])
with evaluation_time
set to 1m
and recording rule
- record: instance:some_query_result:resolution
expr: <some_query>
in place, where <some_query>
is any query returning instant vector results.
Subqueries in promql are often used as a substitution for range selectors over something other than vector selector. For example, you can use max_over_time(my_metric[2m])
, but you cannot use max_over_time( (my_metric + my_other_metric) [2m])
: you need to use something like max_over_time( (my_metric + my_other_metric) [2m:1s])
instead.
Of course, subquery can also be used with simple vector selectors. In that case it behaves as a range selector, but with additional resolution
.
Queries with resolution are executed in the following manner:
timeframe is divided into blocks of resolution
length, and expression inside of subquery is evaluated at the end if each block. Then results of each evaluation are gathered according to range provided.
So in your example sum_over_time(metric_name{labelName="label"}[5m:1m])
: value of metric_name{labelName="label"}
will be taken on each minute, and then put into range of 5 minutes length. This will result in 5 samples, one for each minute, being put into range vector, and later being summed up by sum_over_time
.
A couple important notices:
1m
evaluation will take place at every :00 second, if 1d
- every day at midnight (UTC), and if 17m
- every 17 minutes, with alignment point being 0 by epoch time1.sum_over_time(up[1h:1d])
, and it will produce interesting, but meaningful results.rate
, delta
and more are expecting at least two samples being present in provided range vector. Thus queries like rate(metric[1m:1m])
will not yield any results, and something like rate(metric[1m:57s])
will result in noncontinuous graph. Demo.1 : not sure if it's officially guaranteed, as I haven't seen such in documentation, but this is how Prometheus is factually carrying out resolution (and evaluation of recording rules) and I don't see any reason for this to change in future.