hadoopmapreducehadoop-partitioning

hadoop map reduce secondary sorting


Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator and how does it work in hadoop ?

I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html


Solution

  • Grouping Comparator

    Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.

    public class YearMonthGroupingComparator extends WritableComparator {
    
        public YearMonthGroupingComparator() {
            super(TemperaturePair.class, true);
        }
    
        @Override
        public int compare(WritableComparable tp1, WritableComparable tp2) {
            TemperaturePair temperaturePair = (TemperaturePair) tp1;
            TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
            return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
        }
    }
    

    Here are the results of running our secondary sort job:

    new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000
    

    190101 -206

    190102 -333

    190103 -272

    190104 -61

    190105 -33

    190106 44

    190107 72

    190108 44

    190109 17

    190110 -33

    190111 -217

    190112 -300

    While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners. Refer this link also..What is the use of grouping comparator in hadoop map reduce