I'm using Icinga2
with NSClient++
I have a PowerShell
check for certain cluster roles which is installed on every cluster node
.
Should a cluster role fail, all cluster nodes
would send out identical notifications which will result in a lot of spam for just one actual service problem.
Only installing the check on one cluster node is no option as it would produce a single point of failure for role monitoring: A failing cluster node should not affect the cluster roles (aside from a short timeout) but I would not be able to check any cluster role as soon as it's down.
Is it possible to assign a service
to a hostgroup
in a way that only one notification will be sent if this service fails?
I ended up having the check itself check if he should report a problem as critical (service on the node itself failed) or warning/ok (service on another node failed).