I am using Apache Hadoop-2.7.1 on cluster that consists of three nodes
nn1 master name node
nn2 (second name node)
dn1 (data node)
i have configured high availability,and nameservice and zookeeper is working in all three nodes
and it is started on nn2 as leader
first of all i have to mention that nn1 is active and nn2 is stand by
when i kill name node on nn1
,nn2 becomes active so automatic fail over is happening
but with the following scenario (which i apply when nn1 is active and nn2 is standby)and which is :
when i turn off nn1 (nn1 whole crashing)
nn2 stay stand by and doesn't become active so automatic failover is not happening
with noticeable error in log
Unable to trigger a roll of the active NN(which was nn1 and now it is closed ofcourse)
shouldn't automatic fail over happens with two existing journal nodes on nn2 and dn1
and what could be possible reasons ?
my problem was solved by altering dfs.ha.fencing.methods in hdfs-site.xml
to include not only ssh fencing but also another shell fencing method that
returns always true
<name>dfs.ha.fencing.methods</name>
<value>sshfence
shell(/bin/true)
</value>
automatic failover will fail if fencing fails, i specified two options, the second one( shell(/bin/true)) always returns success. This is done for workaround cases where the primary NameNode machine goes down and the ssh method will fail, and no failover will be performed. We want to avoid this, so the second option would be to failover anyway
you can find details here https://www.packtpub.com/books/content/setting-namenode-ha