I have Prometheus with some alerting rules defined and I want to have statistic regarding the number of alerts fired by Prometheus.
I tried to count how many time an alert is fired with grafana but it doesn't work:
SUM(ALERTS{alertname="XXX", alertstate="firing"})
There is a way to count how many times an alert is fired?
Your query returns how many alerts are firing now, not how many times each alert was fired.
I've found this query to (mostly) work with Prometheus 2.4.0 and later:
changes(ALERTS_FOR_STATE[24h])
It will return the number of times each alert went from "pending" to "firing" during the last 24 hours, meaning it will only work for alerts that have a pending state in the first place (i.e. alerts with for: <some_duration>
specified).
ALERTS_FOR_STATE
is a newly added Prometheus-internal metric that is used for restoring alerts after a Prometheus restart. It's not all that well documented (not at all, actually), but it seems to work.
Oh, and if you want the results grouped by alert (or environment, or job, or whatever) you can sum the results by that label or set of labels:
sum by(alertname) (changes(ALERTS_FOR_STATE[24h]))
will give you how many times each alert fired across jobs, environments etc.