prometheusprometheus-alertmanager

Resolved Alerts resets repeat_interval in Alertmanager


I have alertmanager configuration as

 config:
  global: 
    resolve_timeout: 5m
  route:
    group_by: ['alert_manager_group_by']
    group_wait: 30s
    group_interval: 15m
    repeat_interval: 30m
    receiver: 'alerting-metrics'
    routes:
    - receiver: 'alerting-metrics'
      matchers:
          - group =~ "TestGroup"
      continue: true
    - receiver: 'teams'
      matchers:
        - group =~ "TestGroup"
      continue: true
    - receiver: 'null'
      matchers:
        - alertname =~ "InfoInhibitor|Watchdog"
  receivers:
    - name: 'null'
    - name: 'alerting-metrics'
      webhook_configs:
      - send_resolved: false
        url: 'my webhook'
    - name: 'teams'
      webhook_configs:
      - send_resolved: false
        url: http://prometheus-msteams:2000/high_priority_channel

I have one receiver where i receive the alert and i have prometheus counter there to get stats of how many alerts were fired.

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 3:45:13.104 PM","ts=2024-03-18T15:45:13.104Z caller=notify.go:743 level=debug component=dispatcher receiver=alerting-metrics integration=webhook[0] msg=""Notify success"" attempts=1" 45 minutes after last notification as no new alert was added to the group (group_wait + repeat_interval)

New alert comes with resolved status after ~30 minutes of last notification

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:12.988 PM","ts=2024-03-18T16:15:12.988Z caller=dispatch.go:163 level=debug component=dispatcher msg=""Received alert"" alert=Prometheus_P01_Alert[6fdc8ff][resolved]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:13.063 PM","ts=2024-03-18T16:15:13.063Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][resolved]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:15:13.063 PM","ts=2024-03-18T16:15:13.063Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][resolved]]"

new alert active comes in

alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:16:42.989 PM","ts=2024-03-18T16:16:42.989Z caller=dispatch.go:163 level=debug component=dispatcher msg=""Received alert"" alert=Prometheus_P01_Alert[6fdc8ff][active]"

and the notification is sent after 30s

"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:12.990 PM","ts=2024-03-18T16:17:12.990Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup"}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][active]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:12.990 PM","ts=2024-03-18T16:17:12.990Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup=""{}/{group=~\""TestGroup""}:{alert_manager_group_by=\""Front_Door\""}"" msg=flushing alerts=[Prometheus_P01_Alert[6fdc8ff][active]]"
"alertmanager-monitoring-emea-dev-kube-p-alertmanager-0","3/18/2024, 4:17:13.016 PM","ts=2024-03-18T16:17:13.016Z caller=notify.go:743 level=debug component=dispatcher receiver=alerting-metrics integration=webhook[0] msg=""Notify success"" attempts=1" notification was sent regardless of when the last notification was sent

Question: Is this normal behaviour? Does alertmanager resets repeat_interval?


Solution

  • Documentation for Alertmanager states that repeat_interval is

    How long to wait before sending a notification again if it has already
    been sent successfully for an alert.

    It means that the same alert will trigger one more notification after repeat_interval has passed since the last notification for that alert.

    Since your initial alert had received a resolve, it no longer exists, and repeat_interval doesn't matter.

    And new alert, even if it has the same labels, is treated as a separate object, with a separate lifecycle.


    If you are interested in how to prevent new notifications from the same alert rule for some time, even if the initial alert was resolved, you can check this answer.