I'd like to copy some files from emr-hdfs to s3 bucket using s3-dist-cp, I've tried this cmd from "EMR Master Node":
s3-dist-cp -Dmapred.job.name=my_copy_job --src hdfs:///user/hadoop/abc s3://my_bucket/my_key/
this command executes fine but when I check the job name in yarn resource manager UI, it displays like this:
S3DistCp hdfs:///user/hadoop/abc **->** s3://my_bucket/my_key/
whereas, the expected job name should have been my_copy_job
Appreciate for any help,!
Note:
when I run hadoop distcp with this option -Dmapred.job.name=my_copy_job
, it displays job name correctly in yarn RM UI, but the job eventually fails
s3-dist-cp
does not support -D
style properties set during the runtime as hadoop distcp
does. S3 Distcp accepts only a finite set of options as listed here. In addition to these options defined by S3DistCp
, it accepts the Tool Interface's generic options.
But JobName
is not one of them. JobName
is hardcoded in the S3DistCp code and cannot be overriden.