My goal is to use the java distcp api in java.
With command line i am able to perform a distcp :
hadoop --config /path/to/cluster2/hadoop/conf distcp -skipcrccheck -update hdfs://clusterHA1/path/to/file hdfs://clusterHA2/path/to/target
In java i get some trouble using -skipcrccheck and -update option.
final DistCpOptions distcpOption = new DistCpOptions(sourceFile, destFile);
distcpOption.setSkipCRC(true);
distcpOption.setSyncFolder(true);
runExitCode = this.distCpRun(sourceFile, destFile, distcpOption);
i get this Exception :
java.lang.IllegalArgumentException: Skip CRC is valid only with update options
when you look a the code, the order is very important, so i switch both options :
final DistCpOptions distcpOption = new DistCpOptions(sourceFile, destFile);
distcpOption.setSyncFolder(true);
distcpOption.setSkipCRC(true);
runExitCode = this.distCpRun(sourceFile, destFile, distcpOption);
i get :
java.io.IOException: Check-sum mismatch between source and target
i am pretty sure that setSyncFolder set the update option, in the DistCpOption :
public enum DistCpOptionSwitch {
SYNC_FOLDERS("distcp.sync.folders", new Option("update", false, "Update target, copying only missingfiles or directories")),
}
I am using hadoop 2.6.4 I have mismatch between both cluster because each cluster have is own instance of rangerKMS. I send file from uncrypted zone to crypted zone, this work well in command line.
I finally solve this problem by passing argument to the main function instead of using distcpOption builder.
distCp.run(new String[] {"-skipcrccheck", "-update",source, destination });