I'm trying to turn a Nagios-NRPE check into a Check_MK one. The first one is:
check_procs -w 10 -c 15 -C crond
My attempt is to use the State and coung processes
rule but it always raise a critical alert. The parameters of my rule are (extracted from the rules.mk
configuration file):
'process': 'crond'
'okmax': 10
'okmin': 1
'warnmax': 15
'warnmin': 11
As the WATO config screen says nothing about critical thresholds, I have guessed the values outside these thresholds above raise a critical alert.
My problem is: when this rule is active, an critical alert is raised even when the number of processes found is inside the OK threshold.
The Status detail
of the alert is
CRIT - 7 processes (ok from 1 to 15)CRIT 1620.6 MB virtual, 28.2 MB resident, 2.7% CPU
Then, I cannot understand this behaviour and I feel that I misunderstand the check_MK threshold parameters or I'm missing something.
Can you help me?
Thanx in advance.
As I suspected in my question last paragraph, I misunderstood the check_MK threshold parametes.
These are the python code lines found in ~/share/check_mk/checks/ps
:
state = 0
if count > params["warnmax"] or count < params["warnmin"]:
state = 2
infotext += " (ok from %d to %d)(!!)" % (params["okmin"], params["okmax"])
elif count > params["okmax"] or count < params["okmin"]:
state = 1
infotext += " (ok from %d to %d)(!)" % (params["okmin"], params["okmax"])
So any value lower than warnmin
raises a critical alert. Thus, in order to prevent this, the warn
interval must include the ok
one. In my example, the warmin
value should be lowered down to match the okmin
one.
'process': 'crond'
'okmax': 10
'okmin': 1
'warnmax': 15
'warnmin': 1
In mathematical terms, the ok
interval must be a subinterval of warn
one.
I wrongly guessed these intervals should not overlap, but actually they must.