mavenhadooppom.xmlcloudera-quickstart-vmgiraph

Apache Giraph on Cloudera VM - POM for org.apache.hadoop:hadoop-core:jar:2.6.0 missing, no dependency info


I am new to Hadoop/Giraph and Java. As part of a task, I downloaded Cloudera Quickstart VM and Giraph on top of it. I am using this book named "Practical Graph Analytics with Apache Giraph; Authors: Shaposhnik, Roman, Martella, Claudio, Logothetis, Dionysios" from which I tried to run the first example on Page 111 (Twitter Followership Graph).

Please find the below error while trying to run the changed pom.xml file with the hadoop version on the cluster 2.6.0-mr1-cdh5.12.0

`[cloudera@quickstart first]$ mvn clean install
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building book-examples 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ book-examples ---
[INFO] Deleting /home/cloudera/workspace/first/target
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ book-examples ---
[debug] execute contextualize
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ book-examples ---
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Compiling 1 source file to /home/cloudera/workspace/first/target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[5,27] error: package org.apache.hadoop.io does not exist
[ERROR] /home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[6,27] error: package org.apache.hadoop.io does not exist
[ERROR] /home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[7,29] error: cannot find symbol
[ERROR]  package org.apache.hadoop.util
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[14,17] error: cannot find symbol
[ERROR]  class IntWritable
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[14,30] error: cannot find symbol
[ERROR]  class IntWritable
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[15,0] error: cannot find symbol
[ERROR]  class NullWritable
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[15,14] error: cannot find symbol
[ERROR]  class NullWritable
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[17,28] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[18,3] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[18,16] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[19,12] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[24,12] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[24,25] error: cannot find symbol
[ERROR]  class GiraphHelloWorld
/home/cloudera/workspace/first/src/main/java/GiraphHelloWorld.java:[34,14] error: cannot find symbol
[INFO] 14 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.495s
[INFO] Finished at: Fri Dec 08 14:57:01 PST 2017
[INFO] Final Memory: 18M/57M

`

I added the Cloudera Repository as per a fellow Stack overflow response. Please find the updated pom xml for which the above error applies:

`<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>

<groupId>giraph</groupId>
<artifactId>book-examples</artifactId>
<version>1.0.0</version>

<dependencies>
<dependency>
<groupId>org.apache.giraph</groupId>
<artifactId>giraph-core</artifactId>
<version>1.1.0</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.12.0</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <executions>
                    <execution>
                        <id>create-jar-bundle</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                        <configuration>
                            <descriptorRefs>
                                <descriptorRef>jar-with-dependencies</descriptorRef>
                            </descriptorRefs>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
</plugins>
</build>

<repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

</project>`

The version in book for hadoop is 1.2.1. There is an issue with the book dependencies between two versions.

It would be great if anyone helps me understand how to handle this error.

Thanks in advance.


Solution

  • The pom.xml in your book's copy is outdated. Use this one instead. Source: book examples repository on Github.

    Edit:

    You want to use a recent version of hadoop-core, but the most recent one Maven Central Repository (the default respository) offers is the 1.2.1. You will need to use the Cloudera Repository to get the most recent version of the library. To do that, simply add the repository to your pom.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <project>
        <modelVersion>4.0.0</modelVersion>
        <groupId>giraph</groupId>
        <artifactId>book-examples</artifactId>
        <version>1.0.0</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.giraph</groupId>
                <artifactId>giraph-core</artifactId>
                <version>1.1.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
                <version>2.6.0-mr1-cdh5.12.0</version>
            </dependency>
        </dependencies>
    
        <build>
        </build>
    
        <repositories>
            <repository>
                <id>cloudera</id>
                <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
                <releases>
                    <enabled>true</enabled>
                </releases>
                <snapshots>
                    <enabled>true</enabled>
                </snapshots>
            </repository>
        </repositories>
    </project>
    

    You should now see that Maven tries to find the jars at Cloudera first, falling back to the Central:

    $ mvn clean install
    [INFO] Scanning for projects...
    [INFO] 
    [INFO] ------------------------------------------------------------------------
    [INFO] Building book-examples 1.0.0
    [INFO] ------------------------------------------------------------------------
    ...
    Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/2.6.0-mr1-cdh5.12.0/hadoop-core-2.6.0-mr1-cdh5.12.0.pom
    Downloaded: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/2.6.0-mr1-cdh5.12.0/hadoop-core-2.6.0-mr1-cdh5.12.0.pom (6.4 kB at 2.9 kB/s)
    ...
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 1.661 s
    [INFO] Finished at: 2017-12-08T22:56:04+01:00
    [INFO] Final Memory: 15M/224M
    [INFO] ------------------------------------------------------------------------
    

    Edit 2:

    Ok, I finally got it. Since version 2, Hadoop changed its packaging so instead of declaring a dependency on hadoop-core, you should use the hadoop-client which is a metapackage that aggregates all the necessary dependencies for you. Remove the hadoop-core dependency from your pom.xml and add instead:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.9.0</version>
    </dependency>