I have distcp a file between two hdfs cluster with same version,when I execute failed ,I want to find the failed mapreduce task and related file path,then replay.
Copying 'retrying' actually already happens exactly (mapred.map.max.attempts
times).
If you rerun distcp again, it will only try to copy files that haven't already been copied. (files successfully copied by a previous distcp on a re-execution will be marked as "skipped".)
If you would like a log of the files that couldn't be copied you can specify '-i
' and -log <logdir>
. This will ignore failures but write out a more complete log of what failed and why they failed.