I believe the following summary macros are not accounting for Passive Services: $TOTALSERVICESCRITICALUNHANDLED$ (This is the one with which I see the problem directly) And I assume the following two have the same issue: $TOTALSERVICESWARNINGUNHANDLED$ $TOTALSERVICESUNKNOWNUNHANDLED$ Passive Services who are NOT in downtime and NOT Acknowledged rightfully show up in the Unhandled Services page of Nagios Core. But, a script Im using spits out the value of $TOTALSERVICESCRITICALUNHANDLED$ that does not account for passive services who are non-downtime, non-ack, and in a critical state. The wordage on this macro indicates that the service must have 'checks enabled', but this probably does not account for passive checks?: " This macro reflects the total number of services that are currently in a CRITICAL state that are not currently being "handled". Unhandled services problems are those that are not acknowledged, are not currently in scheduled downtime, and for which checks are currently enabled. "
My setup: I have a command that is executed by a regularly scheduled service. The command passes the value of macro $TOTALSERVICESCRITICALUNHANDLED$ to a script. The script just echos the value of that macro.
Test: All services are in downtime except my passive service who has Passive Services Enabled and is in a critical state. The script is spitting out "0" for number of unhandled critical alerts (this is incorrect!) Enable active checks on the passive service and the script now tells me "1"
Nagios Core Version 4.3.2 Please advise whether this is a bug that was addressed in a later version or whether there is any workaround for me? I have seen this related issue which was fixed in 4.2.2 but is a different issue: viewtopic.php?t=39957
I ended up making this change to the source code. I can assume if a service is in downtime or is Acknowledged it will already be not counted in the total alarm count. So the check for checks_enabled is redundant and incorrectly throws out passive services (seems checks_enabled is a flag that only represents ACTIVE checks)
common/macros.c starting at line 1216: Comment out the 3 instances, 2 lines each, where it checks "if(temp_service->checks_enabled == FALSE) problem = FALSE
(And then rebuild Nagios Core)
The only way I can see this coming back to bite me is if theres a case where active services had their active checks disabled and also were not in Downtime or Acknowledged