After reading Hadoop speculative task execution I am trying to turn off speculative execution using the new Java api, but it has no effect.
This is my Main class:
public class Main {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//old api:
//conf.setBoolean("mapred.map.tasks.speculative.execution", false);
//new api:
conf.setBoolean("mapreduce.map.speculative", false);
int res = ToolRunner.run(conf, new LogParserMapReduce(), args);
System.exit(res);
}
}
And my MapReducer starts like this:
@Override
public int run(String[] args) throws Exception {
Configuration conf = super.getConf();
/*
* Instantiate a Job object for your job's configuration.
*/
Job job = Job.getInstance(conf);
But when I look at the logs I see:
2014-04-24 10:06:21,418 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat (main): Total input paths to process : 16
2014-04-24 10:06:21,574 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): number of splits:26
If I understand then this means that the speculative execution is still on, otherwise why would there be 26 splits if I only have 16 input files. Am I mistaken?
Note: I believe I use the new api, as I see these warnings in the log:
2014-04-24 10:06:21,590 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
"16 file = 16 Mappers" that is a wrong assumption.
"16 Files = Minimum 16 Mappers" This is correct.
If some of the 16 files are bigger than the block size they are split to multiple mappers. Hence your 16 files generating 26 Mappers may not be because of speculative execution.
Setting the value in Conf certainly works. You can verify by checking your job.xml