monitoringnagiosfailoverclustericinga

Monitor Failovercluster roles with Icinga2


I'm using Icinga2 with NSClient++

I have a PowerShell check for certain cluster roles which is installed on every cluster node. Should a cluster role fail, all cluster nodes would send out identical notifications which will result in a lot of spam for just one actual service problem.

Only installing the check on one cluster node is no option as it would produce a single point of failure for role monitoring: A failing cluster node should not affect the cluster roles (aside from a short timeout) but I would not be able to check any cluster role as soon as it's down.

Is it possible to assign a service to a hostgroup in a way that only one notification will be sent if this service fails?


Solution

  • I ended up having the check itself check if he should report a problem as critical (service on the node itself failed) or warning/ok (service on another node failed).