javaapachehadoopmapreducewritable

Hadoop Reducer Custom Writable


I have this following Reducer class

public class CompanyMinMaxReducer extends Reducer<Text, DateClosePair, Text, Text> {
   private Text rText = new Text();

public void reduce(Text key, Iterable<DateClosePair> values, Context context)
          throws IOException, InterruptedException {

int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
    LongWritable minDay = new LongWritable();
    LongWritable maxDay = new LongWritable();

for(DateClosePair val: values){
  LongWritable tempDate = val.getDate();
      DoubleWritable tempClose = val.getClose();

      if(tempDate.compareTo(maxDay) > 0){
        maxDay = tempDate;
      }else if(tempDate.compareTo(minDay) < 0){
        minDay = tempDate;
      }


      if(tempClose.get() > max){
        max = (int)tempClose.get();
      }else if(tempClose.get() < min){
        min = (int)tempClose.get();
      }
    }

String minDayFinal = "" + new SimpleDateFormat("yyyy").format(new Date(minDay.get()));
String maxDayFinal = "" + new SimpleDateFormat("yyyy").format(new Date(maxDay.get()));
    String output = minDayFinal + " - " + maxDayFinal + " MIN: " + min + " MAX: " + max;

    rText.set(output);
    context.write(key, rText);
}
}

My dataset is in the following format:

exchange, stock_symbol, date, stock_price_open,stock_price_high,stock_price_low, stock_price_close, stock_volume,stock_price_adj_close.

For example:

NASDAQ,AAPL,1970-10-22, ... 

I am asked to write a new MapReduce program that for each company provides the range of years it has been present in the stock market, and the maximum and minimum closing value obtained by the stock.

My program produces the correct output but the start date is constant for some reason:

AAON    1970 - 2002 MIN: 1 MAX: 35
AATI    1970 - 2010 MIN: 2 MAX: 15
ABCO    1970 - 2004 MIN: 14 MAX: 69
ABCW    1970 - 2007 MIN: 0 MAX: 53
ABII    1970 - 2008 MIN: 25 MAX: 78
ABIO    1970 - 1999 MIN: 0 MAX: 139
ABMC    1970 - 2004 MIN: 0 MAX: 6
ABTL    1970 - 2004 MIN: 0 MAX: 58
ACAD    1970 - 2009 MIN: 0 MAX: 17
ACAP    1970 - 2005 MIN: 15 MAX: 55
ACAT    1970 - 2009 MIN: 3 MAX: 29
ACCL    1970 - 1997 MIN: 3 MAX: 104
ACEL    1970 - 1998 MIN: 0 MAX: 10
ACET    1970 - 2004 MIN: 4 MAX: 27
ACFC    1970 - 2008 MIN: 1 MAX: 20
ACGL    1970 - 1997 MIN: 11 MAX: 80
ACLI    1970 - 2006 MIN: 2 MAX: 77
ACLS    1970 - 2001 MIN: 0 MAX: 30

The DateClosePair is a customer Writable I wrote like every example you would find on the web.

It is very odd that the min_closing price and the max_closing price are correct but the mix_date and max_date wrong.

Any thoughts?


Solution

  • I have resolved the problem which comes to be caused by aliasing.

    Instead of doing maxDay = tempDate; where now maxDay is pointing to the tempDate object, I should call the method .set().

    Solution:

    maxDay.set(tempDate.get());