prometheusprometheus-alertmanagerprometheus-blackbox-exporter

Creating alert using prometheus everytime there is an error


I am new to prometheus and alerting system. I have developed a microservice and added metrics code to get the total number of increments whenever there is an error. Now I am trying to create an alert so that whenever there is an increment in the error, it should flag out and send a mail. but I am unable to form a proper query for this scenario. I have used something like error_total > 0 to send an alert but it will work everytime since the count will be > 0 unless we reset it manually.


Solution

  • What you are looking for is the increase function. The following expression trigger en error whenever there was an error in the previous 15min:

    expr: increase(my_error_metric[15m]) > 0
    annotations:
      summary: "Hey! There were {{ $value }} errors in the last 15 minutes"
    

    Errors are common in microservices and alerting on each of them is generally unmanageable. A more common strategy is to alert only when the error rate exceeds a given threshold (by example 5%):

    expr: irate(my_error_metric[2m]) / irate(number_of_call[2m]) * 100 > 5
    

    Alerting on increase may also mean you can miss some errors because the alert is triggered on the error but another error occurs during investigation. There won't be a second alert, it will be included in the first one.