I want to use a combiner in my MR code say WordCount.
How should I implement it?
What sort of data is being passed to the reducer from the combiner?
It will be great if anyone of you can provide me codes of both Combiner as well as the Reducer.
It will be better if you can explain the way the combiner works
I am new to mapreduce and I am at a learning stage.
Thanks in advance :)
A Combiner, also known as a semi-reducer.
The main function of a Combiner is to summarize the map output records with the same key.
The Combiner class is used in between the Map class and the Reduce class to reduce the volume of data transfer between Map and Reduce
Explanation with sample code.
MAP Input:
What do you mean by Object
What do you know about Java
What is Java Virtual Machine
How Java enabled High Performance
MAP output
<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>
<What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
<What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
<How,1> <Java,1> <enabled,1> <High,1> <Performance,1>
This MAP output will be passed as input to Combiner.
Combiner output
<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>
<know,1> <about,1> <Java,1,1,1>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>
This combiner output is passed as input to Reducer.
Reducer Output
<What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>
<know,1> <about,1> <Java,3>
<is,1> <Virtual,1> <Machine,1>
How,1> <enabled,1> <High,1> <Performance,1>
If you are using java, below code will set Combiner & Reducer to same class, which is ideal.
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
Have a look at working example in java @tutorialspoint