I have a 3 node hadoop cluster (DigitalOcean droplets):
Whenever I run a mapreduce streaming job and a worker node gets selected to run the ApplicationMaster the job hangs as it tries to connect to the ResourceManager. The datanode log shows it tries to connect to 0.0.0.0
INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s);
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s);
That is the default value of the yarn.resourcemanager.hostname property
.
However I have specified this property in yarn-site.xml for both of my worker nodes:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
And on all my nodes, my /etc/hosts file looks like this, so hadoop-master should point to the correct IP address.
#127.0.1.1 hadoop-worker1 hadoop-worker1
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
#ff02::3 ip6-allhosts
165.22.19.161 hadoop-master
165.22.19.154 hadoop-worker1
165.22.19.158 hadoop-worker2
I also checked the configurations by going to hadoop-worker1:9864 and accessing the web interface of the worker node to see what was loaded:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
Even more, I tried running a YARN command from one of the workernodes and it can actually contact the ResourceManager correctly:
hadoop@hadoop-worker1:/opt/hadoop$ yarn node --list
2019-06-15 18:47:42,119 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/165.22.19.161:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop-worker2:40673 RUNNING hadoop-worker2:8042 0
hadoop-worker1:41875 RUNNING hadoop-worker1:8042 1
hadoop-master:40075 RUNNING hadoop-master:8042 0
hadoop@hadoop-worker1:/opt/hadoop$
I am unsure what to do, I believe it might have something to do with streaming jobs not correctly loading the settings, any help would be appreciated as I have been stuck on this issue for 2 days.
I have added the -D yarn.resourcemanager.hostname=hadoop-master
flag to the mapred streaming command and it seems to work now.