hadoophdfsdistcp

DistCP - Even simple copies result in CRC Exceptions


I'm running into an issue using distcp to copy files - every copy fails with an IO Exception (Checksum mismatch), even if performing a simple copy within the cluster (i.e. hadoop distcp -pbugctrx /foo/bar /foo/baz).

If forced to complete the copy using -skipcrccheck, I can see that the checksum is different ( hdfs dfs -checksum ), but that this isn't being caused by a difference in the actual source data (hdfs dfs -cat | md5sum returns matching checksums for source and destination).

I'm leery of disabling a data integrity check if I don't need to. Is there a better way to address this failing check than just ignoring it.


Solution

  • Both the source and target may be in different encryption zones. In that case also the checksum will fail