I am implementing Hadoop mapreduce. My input to map is a table as shown below:
customerid, IP, Attr , Date
customer1, IP1, attr1, date1
customer2, IP2, attr1, date2
the output from the mapper should be multiple files
File 1 : IP-m-00000
key, value
customer1_IP1 , date1
customer2_IP2 , date2
File 2: Attr-m-00000
key, value
customer1_attr1 , date1
customer2_attr1 , date2
I have hadoop 2.2.0 installed and i am using the following code
MultipleOutputs.addMultiNamedOutput (job, "IP", TextOutputFormat.class, Text.class, Text.class); // in the Driver.class
MultipleOutputs.getCollector("IP", context).collect(txtKey, txtValue); // in the Mapper.class
where my txtKey is customerid_$Attribute, txtValue is the date.
I have 2.8.0 installed on another personal machine and MultipleOutputs object has write functionality which was very easy to implement. MultipleOutputs.write() which is in hadoop-2.8.0 is not implemented in hadoop-2.2.0.
Any ideas on how to write multipleOutput files in hadoop-2.2.0 where we do not have MultipleOutputs.write() functionality?
If this question requires any modification, can you please comment and not close the question!
Thanks, Guru
The above code contains addMultiNamedOutput() method, this method is used if you have multi-level output. Use .addNamedOutput() method instead and it worked for hadoop-2.2.0.
If you want Attr-m/r-00000 , use .addNamedOutput() method. If you want Attr-SubAttr-m/r-00000 use .addMultiNamedOutput() method.