I am trying to get the total shard sizes from the _internal InfluxDB database. The field is called diskBytes, and it is stored every 10 seconds by default. It is queried by something like
SELECT diskBytes FROM "shard" WHERE "database"='_internal' AND time>t1 AND time<t2
There is a value for each of the paths (labelled by a tag), and to get the total size across all paths I then need to sum them.
The data looks like this:
If I try to group them by a small time interval (less than the 10s collection rate), then it works correctly using the query
SELECT sum(diskBytes) FROM "shard" WHERE "database"='_internal' AND time>t1 AND time<t2 GROUP BY time(5s)
However, I would now like to query the data at a lower rate, let's say once every 10 minutes or something. Then have it return the first/mean value over those 10 minutes. If I group by anything longer than 10s, then you can see below that it doesn't work (it just sums across all values in the time interval, rather than summing over paths only):
What's the right way to query this?
(I am using InfluxDB v1.8)
In the end, this needed a "subquery" :
SELECT first("diskBytes_summed") FROM (
SELECT sum("diskBytes") AS "diskBytes_summed" FROM "shard"
WHERE "database"='_internal' AND time>t1 AND time<t2 GROUP BY time(5s) fill(null)
)
GROUP BY time(10m)
The inner query makes sure that all data are collected and summed properly, and the outer query then allows downsampling to a more coarse time grid.