hadoopipcluster-computinghostsopennebula

Hadoop cluster showing only 1 live datanode


I am trying to configure a 3-node Apache Hadoop cluster. I already did it in docker environment and everything worked fine there. Now, I am trying to move to Open Nebula environment. I have 3 VMs with Ubuntu and Hadoop. When I start hadoop using ./sbin/start-dfs.sh, Hadoop opens up datanodes on all the slaves and everything looks fine till this point. But if I use "./bin/hdfs dfsadmin -report", it only shows me 1 live data node. Check out the following

enter image description here

Here is the result of JPS command on my master:

enter image description here

JPS command on Slave:

enter image description here

I am also able to SSH all the machines. My guess is that something is wrong with my host file because my slaves are not been able to reach the master. Here is my master /etc/hosts.

<my_ip_1> master
<my_ip_2> slave-1
<my_ip_3> slave-2

127.0.0.1       localhost

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

I have not modified my /etc/hostname file but it looks like this. where "my_ip_1" represents the current IP of the VM.

<my_ip_1>.cloud.<domain>.de

Further, if I run the hadoop PI example using the command

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 100 10000000

I get the following error in the slave-1 and slave-2 log file. But the master node solves the PI problem on its own.

2015-08-25 15:27:03,249 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/<my_ip_1>:54310. Already tried 10 time(s); maxRetries=45
2015-08-25 15:27:23,270 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/<my_ip_1>:54310. Already tried 11 time(s); maxRetries=45
2015-08-25 15:27:43,290 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/<my_ip_1>:54310. Already tried 12 time(s); maxRetries=45

I have already tried: http://www.quora.com/The-master-node-shows-only-one-live-data-node-when-I-am-running-multi-node-cluster-in-Hadoop-What-should-I-do


Solution

  • Ok I managed to figure out the problem and found the fix.

    Problem:

    My slave nodes were not communicating with the master. So, I checked the firewall settings on my machines (Ubuntu) using the following command

    sudo ufw status verbose
    

    The output of the command

    Status: active
    Logging: on (low)
    Default: deny (incoming), allow (outgoing), disabled (routed)
    New profiles: skip
    

    Solution:

    So, my machines were denying any incoming requests. So, I disabled my firewall to verify the assumption.

    sudo ufw disable
    

    Before disabling the firewall, telnet <my_ip_1> 54310 was giving me connection timeout. But after disabling the firewall, it worked fine. Then I disabled the firewall on all the machine and ran the PI example of hadoop again. It worked.

    Then I reenabled the firewall on all machines

    sudo ufw enable
    

    And I added some firewall rules for incoming requests from my own IP addresses like

    sudo ufw allow from XXX.XXX.XXX.XXX
    

    Or if you want to allow an IP range from 0-255 then

    sudo ufw allow from XXX.XXX.XXX.0/24
    

    Since, I had 3 machines. So, for each machine I added the IP address of other 2 machines. Everything went fine.