According to documentation from Hortonworks, the way to execute Hadoop jobs in "uber mode", is to configure one's maprep-site.xml
settings like so:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.ubertask.maxmaps</name>
<value>1</value>
</property>
<property>
<name>mapreduce.job.ubertask.maxreduces</name>
<value>1</value>
</property>
<property>
<name>mapreduce.job.ubertask.maxbytes</name>
<value>134217728</value>
</property>
</configuration>
For mapreduce.job.ubertask.maxbytes
, I didn't really know what to put, I copied it from the dfs.block.size
parameter in hdfs-site.xml
and full disclosure I didn't really know what value to put there.
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<description>Block size</description>
</property>
Initially that block size was allocated according to my hunch that one of the reasons my job was failing was that the input data- which needs to be atomic (in the sense that it can't be broken up and fed into the mapper piecemeal)- was being split up in HDFS.
So nevertheless, despite the fact that these settings have been configured in such a way that the Hortonworks documentation, and others, would have one believe is sufficient to execute the job in "uber mode", the job does not in fact execute in that mode, as you can see below:
Is there something wrong with the settings as I've configured them that is preventing my job from executing in uber mode?
Those configuration settings in the OP are OK- the thing about uber mode
is that you can only have a single input file, not multiple- as it was before. See here:
17/10/12 20:42:42 INFO input.FileInputFormat: Total input files to process : 1
17/10/12 20:42:43 INFO mapreduce.JobSubmitter: number of splits:1
17/10/12 20:42:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1507833515636_0005
17/10/12 20:42:44 INFO impl.YarnClientImpl: Submitted application application_1507833515636_0005
17/10/12 20:42:44 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1507833515636_0005/
17/10/12 20:42:44 INFO mapreduce.Job: Running job: job_1507833515636_0005
17/10/12 20:42:49 INFO mapreduce.Job: Job job_1507833515636_0005 running in uber mode : true
or, straight from the horse's mouth: