amazon-web-servicesamazon-s3amazon-emrdistcps3distcp

Is it possible to specify the number of mappers-reducers while using s3-dist-cp?


I'm trying to copy data from an EMR cluster to S3 using s3-distcp. Can I specify the number of reducers to a greater value than the default so as to fasten my process?


Solution

  • For setting up number of reducers, you can use the property mapreduce.job.reduces similar to below:

    s3-dist-cp -Dmapreduce.job.reduces=10 --src hdfs://path/to/data/ --dest s3://path/to/s3/