hadoophadoop-yarndistcp

Set YARN application name for Hadoop Distcp job


NOTE: I don't want to specify a YARN-queue name as in Hadoop: specify yarn queue for distcp


I frequently use hadoop distcp for moving data around HDFS and would like to have a descriptive application name for these jobs.


Presently all copying jobs just appear with the name "distcp" on Resource Manager UI and there's no way to distinguish between different jobs.

enter image description here


Is there a way to improve it?


Solution

  • Like many other MR tools, hadoop distcp also allows you to pass mapred properties using

    -Dmapred.property.name=property-value


    so when I use

    hadoop distcp \
      -Dmapred.job.name=billing_db.replicate \
      -m 10 \
      /user/hive/warehouse/billing_db.db/ \
      s3a://my-s3-bucket/billing_db.db/
    

    it appears nicely on Resource Manager UI

    enter image description here


    References