I have a cascading job that outputs 30 25MB files. Is there anyway I can reduce it to 256 mb files each. I tried -Dmapreduce.job.reduces=1. It does not seem to wrok. ANy guidance would be helpful
Total memory = 30*25 = 750 ,
required output files = 750/256 = 3 ,
You can add one more map-reduce task in your job.On final output pipe put group by or unique or any operation for which you can set number of reducers and set number of reducers as 3 or 4 so you will get number of output files as number of reducers.