hadoophdfsdistcp

How can I get distcp failed files and replay the task?


I have distcp a file between two hdfs cluster with same version,when I execute failed ,I want to find the failed mapreduce task and related file path,then replay.


Solution

  • Copying 'retrying' actually already happens exactly (mapred.map.max.attempts times).

    If you rerun distcp again, it will only try to copy files that haven't already been copied. (files successfully copied by a previous distcp on a re-execution will be marked as "skipped".)

    If you would like a log of the files that couldn't be copied you can specify '-i' and -log <logdir>. This will ignore failures but write out a more complete log of what failed and why they failed.