I have a Hadoop 3.2.2 Cluster with 1 namenode/resourceManager and 3 datanodes/NodeManagers.
this is my yarn-site config
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bd-1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
When I run the example job
python mr_word_count.py -r hadoop -v hdfs:///user/hduser/testme.txt
I have this error
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
What have I done so far:
python mr_word_count.py testme.txt
#!/usr/bin/python
and # -*-coding:utf-8 -*
to the script headI can define python bin in .mrjob.conf but then the error code changes to 126
in the Console I see map 100% reduce 100%
In the WebUI I also see that the Job is processing, CPU and Memory is consumed by the job.
I'm googling and reading stackoverflow/haddop documentation now since 4 Days for many many hours without a result. any ideas what could be wrong?
I forgot to install mr_job on all nodes...
run this on all nodes fixed the problem:
pip3 install MRJob