springhadoopspring-dataspring-data-hadoop

running a distcp job from spring in hadoop 2.x


I have been using spring data hadoop in on of my projects and have been able to run distcp jobs in hadoop 1.x. Recently we have upgraded to hadoop 2.x and for that I upgraded spring-data-hadoop to 2.0.4. Most of the stuff is still working but I am running into some issues with distcp. It seems like spring data hadoop is calling distcp like this

Class<org.apache.hadoop.tools.DistCp> cl = org.apache.hadoop.tools.DistCp.class;
Class<?> argClass = ClassUtils.resolveClassName("org.apache.hadoop.tools.DistCp$Arguments",
                cl.getClassLoader());

https://github.com/spring-projects/spring-hadoop/blob/2.0.4.RELEASE/spring-hadoop-core/src/main/java/org/springframework/data/hadoop/fs/DistCp.java#L274-L275

Notice that the spring code is looking for Arguments inner class in Distcp class but this inner class doesnt seems to exist anymore in newer Distcp code. When i run the job I get this error

Caused by: java.lang.IllegalStateException: Cannot run distCp impersonated as 'null'
        at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:268) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
        at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:216) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
        at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:152) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
        at com.att.hadoop.hdfspub.source.hdfs.HdfsFileCopier.copyFolder(HdfsFileCopier.java:104) ~[classes/:na]
        ... 45 common frames omitted
Caused by: java.lang.IllegalArgumentException: Cannot find class [org.apache.hadoop.tools.DistCp$Arguments]
        at org.springframework.util.ClassUtils.resolveClassName(ClassUtils.java:286) ~[spring-core-4.1.4.RELEASE.jar:4.1.4.RELEASE]
        at org.springframework.data.hadoop.fs.DistCp.invokeCopy(DistCp.java:275) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
        at org.springframework.data.hadoop.fs.DistCp.copy(DistCp.java:265) ~[spring-data-hadoop-core-2.0.4.RELEASE.jar:2.0.4.RELEASE]
        ... 48 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.tools.DistCp$Arguments

In my pom.xml I have included spring-data-hadoop 2.0.4.RELEASE and hadoop-distcp.2.2.0.


Solution

  • This has been address in the recent 2.1 RC1 release and we haven't yet looked into backporting that to the 2.0.x branch. If you want to try with the 2.1.0.RC1 version you need to include the Spring IO repo in your build - see the Quick Start section of the project page - http://projects.spring.io/spring-hadoop/