javahadoopmapreducestringtokenizer

Failed to ".add(StringTokenizer.nextToken())" in an ArrayList<String> inside Hadoop's MapReducer code


I am working on trying to add StringTokenizer.nextToken() to an ArrayList within my Hadoop Map Reduce code. The code works just fine and has an output file once run, but it once I've added an SstringTokenizer line it suddenly broke.

Here's my code:

public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {

            List<String> texts = new ArrayList<String>();  

            StringTokenizer itr = new StringTokenizer(value.toString(), "P");

            while (itr.hasMoreTokens()) {
                System.out.println(itr.nextToken());
                texts.add(itr.nextToken());  //The code broke here
            }
      }

Note I didn't add the Hadoop's Text Class to write just yet in this code, but it works with my previous code.

Here's my Reducer

   public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

Here's the .main

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(JobCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
    ```

Note: I've also tried using the normal Array and it still broke.

The project is running on Java 8 jdk and has imported Maven's HadoopCommon version 3.3.0 and HadoopCore of 1.2.0 [Mac OS] 

Here's my error log:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/domesama/.m2/repository/org/apache/hadoop/hadoop-core/1.2.1/hadoop-core-1.2.1.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/09/15 14:18:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/09/15 14:18:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
20/09/15 14:18:07 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
20/09/15 14:18:07 INFO input.FileInputFormat: Total input paths to process : 1
20/09/15 14:18:07 WARN snappy.LoadSnappy: Snappy native library not loaded
20/09/15 14:18:07 INFO mapred.JobClient: Running job: job_local1465674096_0001
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Waiting for map tasks
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Starting task: attempt_local1465674096_0001_m_000000_0
20/09/15 14:18:07 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
20/09/15 14:18:07 INFO mapred.MapTask: Processing split: file:/Users/domesama/Desktop/Github Respositories/HadoopMapReduce/input/SampleFile.txt:0+1891
20/09/15 14:18:07 INFO mapred.MapTask: io.sort.mb = 100
20/09/15 14:18:07 INFO mapred.MapTask: data buffer = 79691776/99614720
20/09/15 14:18:07 INFO mapred.MapTask: record buffer = 262144/327680
20/09/15 14:18:07 INFO mapred.MapTask: Starting flush of map output
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Map task executor complete.
20/09/15 14:18:07 WARN mapred.LocalJobRunner: job_local1465674096_0001
java.lang.Exception: java.util.NoSuchElementException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.util.NoSuchElementException
    at java.base/java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
    at JobCount$TokenizerMapper.map(JobCount.java:50)
    at JobCount$TokenizerMapper.map(JobCount.java:20)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:830)
,84,01,02600,01,1007549,00065,19,1,,,2,2,2,2,2,,,2,,2,,,,1,2,2,2,2,2,2,0000000,,,,2,5,,,,,,1,4,,,,,,,,,,2,5,2,2,3,000000,00000,17,000000,2,15,19,0000000,2,00000,00000,0000000,,3,,2,,4,999,999,,2,,,6,,,1,01,,,,,,,6,,1,,0,,,000000000,000000000,028,,,,1,2,1,1,01,001,0,0,0,0,1,0,0,1,0,,,,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,0,,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,00005,00127,00065,00066,00069,00005,00120,00066,00063,00005,00067,00006,00005,00137,00124,00065,00066,00064,00063,00006,00131,00006,00062,00063,00060,00126,00006,00066,00068,00120,00066,00126,00115,00005,00005,00063,00066,00066,00062,00005,00118,00006,00064,00066,00062,00124,00006,00063,00068,00132,00062,00119,00126,00006,00005,00068,00072,00065,00066,00125,00005,00123,00062,00064,00065,00006,00123,00065,00067,00006,00068,00006,00005,00127,00119,00063,00068,00067,00064,00122
20/09/15 14:18:08 INFO mapred.JobClient:  map 0% reduce 0%
20/09/15 14:18:08 INFO mapred.JobClient: Job complete: job_local1465674096_0001
20/09/15 14:18:08 INFO mapred.JobClient: Counters: 0

The System.out.print(itr.nextToken()); did also print, but it seems like it somehow execute the

texts.add(itr.nextToken());  //The code broke here

Perhaps I may need something like await async (like in JS) in my code?


Solution

  • If you use StringTokenizer you always need to call hasMoreTokens() method to check if there is any token left before calling nextToken(), while in your code you call nextToken() twice.

    The fix should be just to call nextToken() one time in the loop.

    while (itr.hasMoreTokens()) {
        String token = itr.nextToken();  // one call for each hasMoreTokens
        System.out.println(token);
        texts.add(token);  
    }