hadoopamazon-s3mapreducedistcps3distcp

-Dmapred.job.name does not work with s3-dist-cp command


I'd like to copy some files from emr-hdfs to s3 bucket using s3-dist-cp, I've tried this cmd from "EMR Master Node":

s3-dist-cp -Dmapred.job.name=my_copy_job --src hdfs:///user/hadoop/abc s3://my_bucket/my_key/

this command executes fine but when I check the job name in yarn resource manager UI, it displays like this: S3DistCp hdfs:///user/hadoop/abc **->** s3://my_bucket/my_key/

whereas, the expected job name should have been my_copy_job

Appreciate for any help,!

Note: when I run hadoop distcp with this option -Dmapred.job.name=my_copy_job, it displays job name correctly in yarn RM UI, but the job eventually fails


Solution

  • s3-dist-cp does not support -D style properties set during the runtime as hadoop distcp does. S3 Distcp accepts only a finite set of options as listed here. In addition to these options defined by S3DistCp, it accepts the Tool Interface's generic options.

    But JobName is not one of them. JobName is hardcoded in the S3DistCp code and cannot be overriden.