javahadoopmapreducecombinersbigdata

Why combiner output records = 0?


I have multipleinput, so I have two mappers. Also I have one combiner:

class JoinCombiner extends MapReduceBase implements
        Reducer<TextPair, Text, TextPair, Text> {

    @Override
    public void reduce(TextPair key, Iterator<Text> values,
            OutputCollector<TextPair, Text> output, Reporter reporter)
            throws IOException {

        Text nodeId = new Text(values.next());
        while (values.hasNext()) {
            Text node = values.next();
            TextPair outValue = new TextPair(nodeId.toString(), "0");
            output.collect(outValue , node);
        }
    }
}

When I use this class as Reducer - all words good. but if i use it as combiner - I have this info in log:

Combine input records=6
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=30
Reduce input records=0
Reduce output records=0

So, no output from combiner -> no input for reduce. I cant understand why. Please make some explanation if you have ideas)) Thanks


Solution

  • A combiner gets executed only if you have a reducer. Try setting both the combiner and reducer to the same class(if it's possible) and also consider setting the number of reduce tasks.

    UPDATE: You're trying to change the key in combiner. The purpose of the combiner is to group the values of the same key together locally to reduce the traffic.

    From the Hadoop Tutorial on YDN

    Instances of the Combiner class are run on every node that has run map tasks. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers.

    Based on my experience, that is not totally true. Hadoop sends only the keys that are emitted by the mapper to the reducer - meaning if you've a combiner in between it should emit the same key as that of the mapper reducing the number of values associated with the key. IMO, changing the keys in the combiner results in unexpected behavior. To make you understand a simple usecase of combiners, consider a word counter.

    Mapper1 emits:

    hi 1
    hello 1
    hi 1
    hi 1
    hello 1
    

    Mapper2 emits:

    hello 1
    hi 1
    

    You have seven output records. Now if you want to reduce the number of keys locally(meaning on the same machine where the mapper is runnning), then having a combiner will give you something like this:

    Combiner1 emits:

    hi 3
    hello 2
    

    Combiner2 emits:

    hello 1
    hi 1
    

    Notice that combiner did not change the key. Now, at the reducer, you will get the values like this:

    Reducer1: key: hi, values: <3, 1> and you emit hi 4

    Because you've only one reducer, the same reducer will be called again by giving it a different key this time.

    Reducer1: key: hello, values: <2, 1> and you emit hello 3

    The final output would be as follows

    hello 3
    hi 4
    

    The output is sorted on the basis of the keys emitted by the mapper. You can chose to change the key emitted by the reducer but your output will not be sorted by the key emitted by the reducer(by default). Hope that helps.