I have been using spring data hadoop in on of my projects and have been able to run distcp jobs in hadoop 1.x. Recently we have upgraded to hadoop 2.x and for that I upgraded spring-data-hadoop to 2.0.4. Most of the stuff is still working but I am running into some issues with distcp. It seems like spring data hadoop is calling distcp like this
Class<org.apache.hadoop.tools.DistCp> cl = org.apache.hadoop.tools.DistCp.class;
Class<?> argClass = ClassUtils.resolveClassName("org.apache.hadoop.tools.DistCp$Arguments",
cl.getClassLoader());
Notice that the spring code is looking for Arguments
inner class in Distcp
class but this inner class doesnt seems to exist anymore in newer Distcp code. When i run the job I get this error
Caused by: java.lang.IllegalStateException: Cannot run distCp impersonated as 'null'
at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:268) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:216) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:152) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
at com.att.hadoop.hdfspub.source.hdfs.HdfsFileCopier.copyFolder(HdfsFileCopier.java:104) ~[classes/:na]
... 45 common frames omitted
Caused by: java.lang.IllegalArgumentException: Cannot find class [org.apache.hadoop.tools.DistCp$Arguments]
at org.springframework.util.ClassUtils.resolveClassName(ClassUtils.java:286) ~[spring-core-4.1.4.RELEASE.jar:4.1.4.RELEASE]
at org.springframework.data.hadoop.fs.DistCp.invokeCopy(DistCp.java:275) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:265) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
... 48 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.tools.DistCp$Arguments
In my pom.xml I have included spring-data-hadoop 2.0.4.RELEASE and hadoop-distcp.2.2.0.
This has been address in the recent 2.1 RC1 release and we haven't yet looked into backporting that to the 2.0.x branch. If you want to try with the 2.1.0.RC1 version you need to include the Spring IO repo in your build - see the Quick Start section of the project page - http://projects.spring.io/spring-hadoop/