Hadoop Namenode goes down almost everyday once.
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) -
**Error: flush failed for required journal** (JournalAndStream(mgr=QJM to [< ip >:8485, < ip >:8485, < ip >:8485], stream=QuorumOutputStream starting at txid <>))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
Can someone suggest what are the things that I need to look into for resolving this issue?
I am using VMs for the journal nodes and master nodes. Does it cause any issue?
In my case, this issue was caused due to the difference in the system time between the nodes of the cluster.
To keep the system time in sync, we can execute the commands below in each node.
sudo service ntpd stop
sudo ntpdate pool.ntp.org # Run this command multiple times
sudo service ntpd start
If hue is down, run below command on the hue server machine
sudo service hue start
If namenode is down, start the namenode.
Add a crontab for the root user on all the nodes of the environment.
or
Install VM tools, to keep the system time in sync.