I want to monitor some services such that, those services needs to restart when they goes down and I found an amazing tool monit
. It works fine for Zookeeper
since I got a condition like matching "QuorumPeerMain"
as shown below in monitrc
file
check process Zookeeper matching "QuorumPeerMain"
start program = "path/to/zkServer.sh start"
stop program = "path/to/zkServer.sh stop"
In the sameway, I want to monitor these : hadoop, yarn and hbase
check process Hadoop matching "?"
start program = "startorstop.sh start" #equivalent to start-dfs.sh
stop program = "startorstop.sh stop" #equivalent to stop-dfs.sh
What should be written in the place of ?
These are the questions
NameNode
, DataNode
, SecondaryNameNode
. Monit Doc says that "The top-most matching parent with highest uptime is selected". For e.g., If DataNode goes down, it still considers NameNode and won't try to restart hadoop
. Another option was using pid file and I am not able to find hadoop's pid file in /var/run/
zookeeper
only, I want to start the remaining services like hbase
, hadoop
and yarn
I got a way to start NameNode
, DataNode
, SecondaryNameNode
independently using shell scripts i.e., hadoop-daemon.sh
So in my monit conf NameNode
looks like
Credits to @OneCricketeer for the comment, So that I can find a way to start these process independently
check process NameNode matching "NameNode"
start program = "startorstop.sh start" #hadoop-daemon.sh start namenode
stop program = "startorstop.sh stop" #hadoop-daemon.sh stop namenode
group hadoop
and for another part of my question, I got depends
option. For more detail take a look here Service Dependencies
. In my case, I wanted to restart HRegionServer
whenever DataNode
goes down. So below conf works
check process HRegionServer matching "HRegionServer"
start program = "startorstop.sh start" #hbase-daemon.sh start regionserver
stop program = "startorstop.sh stop" #hbase-daemon.sh stop regionserver
depends on DataNode
check process DataNode matching "DataNode"
start program = "startorstop.sh start" #hbase-daemon.sh start datanode
stop program = "startorstop.sh stop" #hbase-daemon.sh stop datanode