hadoopmapreducemultipleoutputs

MultipleOutputs in mapper function Hadoop version issue


I am implementing Hadoop mapreduce. My input to map is a table as shown below:

customerid, IP, Attr , Date

customer1, IP1, attr1, date1

customer2, IP2, attr1, date2

the output from the mapper should be multiple files

File 1 : IP-m-00000

key, value

customer1_IP1 , date1

customer2_IP2 , date2

File 2: Attr-m-00000

key, value

customer1_attr1 , date1

customer2_attr1 , date2

I have hadoop 2.2.0 installed and i am using the following code

MultipleOutputs.addMultiNamedOutput (job, "IP", TextOutputFormat.class, Text.class, Text.class); //  in the Driver.class
MultipleOutputs.getCollector("IP", context).collect(txtKey, txtValue); // in the Mapper.class

where my txtKey is customerid_$Attribute, txtValue is the date.

I have 2.8.0 installed on another personal machine and MultipleOutputs object has write functionality which was very easy to implement. MultipleOutputs.write() which is in hadoop-2.8.0 is not implemented in hadoop-2.2.0.

Any ideas on how to write multipleOutput files in hadoop-2.2.0 where we do not have MultipleOutputs.write() functionality?

If this question requires any modification, can you please comment and not close the question!

Thanks, Guru


Solution

  • The above code contains addMultiNamedOutput() method, this method is used if you have multi-level output. Use .addNamedOutput() method instead and it worked for hadoop-2.2.0.

    If you want Attr-m/r-00000 , use .addNamedOutput() method. If you want Attr-SubAttr-m/r-00000 use .addMultiNamedOutput() method.