I am very new to Influx DB, the flux query language, and time-series databases in general. I have a bucket filled with firewall log entries, and I'd like to count the number of entries grouped by source IP address. In SQL I'd do something like this:
select
_value,
count(*) as count
from my_bucket
where _field = 'ufw_src_ip'
group by _value;
After many hours of banging my head against the wall, I have come up with this:
from(bucket: "my_bucket")
|> range(start: -30d)
|> filter(fn: (r) => r["_field"] == "ufw_src_ip")
|> group(columns: ["_value"])
|> count()
but to no avail, it gives me
cannot aggregate columns that are part of the group key
So how do I actually translate this logic to its flux equivalent? Is it even possible to do it?
After some more trial and error I managed to get it working:
from(bucket: "my_bucket")
|> range(start: -30d)
|> filter(fn: (r) => r["_field"] == "ufw_src_ip")
|> group(columns: ["_value"])
|> count(column: "_field")
|> map(fn: (r) => ({
"Source IP": r._value,
"Count": r._field
}))
|> sort(columns: ["Count"], desc: true)
The trick was to count the values from a column that is not part of the group key, so in this case counting the _field
name column worked (whereas not explicitly specifying a column would cause it to count the _value
column which is part of the group key).
I am still not entirely sure why this restriction exists, but at this point I'm just happy it's working :P
Leaving this here for anybody coming from an SQL-background who might be running into a similar issue.