linuxmonitsystem-administrationprocess-monitoring

Monit says start method not defined despite it being so


I have monit configured to check that my IRCd and their services are running. Recently the instance thats runs all this restarted, and it did not do its job.

It was configured to start on boot.

[root@ip-172-31-21-162 ec2-user]# chkconfig --list monit
monit           0:off   1:off   2:on    3:on    4:on    5:on    6:off

The control file

[root@ip-172-31-21-162 ec2-user]# cat /etc/monit.conf
set httpd port 2812
  allow 127.0.0.1
set daemon 60
  include /etc/monit.d/*

check process ircd with pidfile /home/ec2-user/inspircd/run/pid
  start program = "/usr/bin/perl /home/ec2-user/inspircd/run/inspircd start" 
    as uid "ec2-user" and gid "ec2-user"
    with timeout 30 seconds 

check process services with pidfile /home/ec2-user/anope/run/data/services.pid
  depends on ircd
  start program = "/bin/sh /home/ec2-user/anope/run/bin/anoperc start"
    as uid "ec2-user" and gid "ec2-user"
    with timeout 30 seconds

The syntax of this looks alright according to the documentation...

<START | STOP | RESTART> [PROGRAM] = "program"
    [[AS] UID <number | string>]
    [[AS] GID <number | string>]
    [[WITH] TIMEOUT <number> SECOND(S)]

And doing a check on it says the same

[ec2-user@ip-172-31-29-142 ~]$ sudo monit -t 
Control file syntax OK

Logs show that the start methods are not defined for these monitored processes, though!

[UTC May 14 04:39:51] error    : 'ircd' process is not running
[UTC May 14 04:39:51] error    : monit: Start or stop method not defined -- process ircd
[UTC May 14 04:39:51] error    : 'services' process is not running
[UTC May 14 04:39:51] error    : monit: Start or stop method not defined -- process services

Starting the processes manually through monit works for some reason

[root@ip-172-31-21-162 ec2-user]# monit start ircd
[root@ip-172-31-21-162 ec2-user]# monit status
The Monit daemon 5.2.5 uptime: 7h 14m 

Process 'ircd'
  status                            running
  monitoring status                 monitored
  pid                               26483
  parent pid                        1
  uptime                            3m 
...
  data collected                    Sat May 14 02:49:57 2016

Process 'services'
  status                            running
  monitoring status                 monitored
  pid                               26488
  parent pid                        1
  uptime                            3m 
...
  data collected                    Sat May 14 02:49:57 2016

Which is rather odd. When I stop those checked processes and restart monit with debug logging enabled, I see that it reports on the start programs.

Process Name          = ircd
 Pid file             = /home/ec2-user/inspircd/run/pid
 Monitoring mode      = active
 Start program        = '/home/ec2-user/inspircd/run/inspircd start' as uid 500 as gid 500 timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert

 Process Name          = services
 Pid file             = /home/ec2-user/anope/run/data/services.pid
 Monitoring mode      = active
 Start program        = '/home/ec2-user/anope/run/bin/anoperc start' as uid 500 as gid 500 timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Depends on Service   = ircd
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert

Any idea what in Glob's name is going on here?


Solution

  • According to the documented behavior of monit, a stop method must also be defined for non-running processes to be started properly

    In active mode (the default), Monit will pro-actively monitor a service and in case of problems raise alerts and/or restart the service.

    -- Monit docs; service methods

    The action which is performed by Monit when process is not running was always "restart", but since there was no standalone "restart program" (until Monit 5.7), stop+start sequence was used.

    -- Monit issues; restart instead of start when a process is not running

    Therefore, the solution is and was to add the stop program line to the checked processes in the control file. Evidently if you are running version >=5.7, you could alternatively use restart program