NOTE: I don't want to specify a YARN
-queue name as in Hadoop: specify yarn queue for distcp
I frequently use hadoop distcp
for moving data around HDFS
and would like to have a descriptive application name for these jobs.
Presently all copying jobs just appear with the name "distcp"
on Resource Manager
UI and there's no way to distinguish between different jobs.
Is there a way to improve it?
Like many other MR tools, hadoop distcp
also allows you to pass mapred
properties using
-Dmapred.property.name=property-value
so when I use
hadoop distcp \
-Dmapred.job.name=billing_db.replicate \
-m 10 \
/user/hive/warehouse/billing_db.db/ \
s3a://my-s3-bucket/billing_db.db/
it appears nicely on Resource Manager
UI
References