hadoopcascading

How to merge output of a cascading job to a specific size


I have a cascading job that outputs 30 25MB files. Is there anyway I can reduce it to 256 mb files each. I tried -Dmapreduce.job.reduces=1. It does not seem to wrok. ANy guidance would be helpful


Solution

  • Total memory = 30*25 = 750 ,

    required output files = 750/256 = 3 ,

    You can add one more map-reduce task in your job.On final output pipe put group by or unique or any operation for which you can set number of reducers and set number of reducers as 3 or 4 so you will get number of output files as number of reducers.