terraformdatadog

How do I count the total threshold breaches using a Datadog monitor query?


I have this query below for a datadog monitor. The query specifies a latency threshold of 10000ms during put-operations on a map.

However, I would like to monitor how many times the threshold of 10000ms is breached within a 30 minute period and to trigger an alert if the threshold is breached at least 4 times.

How can this query be updated to count the threshold violations in the last 30 minutes?

avg(last_10m):sum:hazelcast.imap.local_total_put_latency{env:dev,name:initMap,service:myService} by {host} / sum:hazelcast.imap.local_put_operation_count{env:dev,name:initMap,service:myservice} by {host} > 10000

Solution

  • Resolved by creating a datadog event monitor to count the number of triggered occurrences of the monitor. So essentially you need two monitors. Surprisingly enough with AppD, this task can be achieved using a single monitor.