scalaapache-sparkspark-graphxpagerank

GraphX - Class file needed by Graph is missing


I am new to Scala/Spark. I am trying to compile and run a sample GraphX code. Original file link: PageRank

My code, slightly edited to avoid issues:

// scalastyle:off println
package org.apache.spark.examples.graphx
// $example on$
import org.apache.spark.graphx.GraphLoader
// $example off$
import org.apache.spark.sql.SparkSession
/**
 * A PageRank example on social network dataset
 * Run with
 * {{{
 * bin/run-example graphx.PageRankExample
 * }}}
 */
object PageRankExampl {
    def main(args: Array[String]): Unit = {
        // Creates a SparkSession.
        val spark = SparkSession
            .builder
            .appName("PageRankExampl")
            .getOrCreate()
        val sc = spark.sparkContext

        // $example on$
        // Load the edges as a graph
        val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
        // Run PageRank
        val ranks = graph.pageRank(0.0001).vertices
        // Join the ranks with the usernames
        val users = sc.textFile("data/graphx/users.txt").map { line =>
            val fields = line.split(",")
            (fields(0).toLong, fields(1))
        }
        val ranksByUsername = users.join(ranks).map {
            case (id, (username, rank)) => (username, rank)
        }
        // Print the result
        println(ranksByUsername.collect().mkString("\n"))
        // $example off$
        spark.stop()
    }
}
// scalastyle:on println

Build File:

name := "hello"

version := "1.0"

libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.2.1" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.2.1" % "provided",
"org.apache.spark" % "spark-graphx_2.11" % "2.2.1" % "provided"
)

The error I am getting:

Starting sbt: invoke with -help for other options

[info] Set current project to hello (in build file:/usr/local/spark-2.2.1-bin-hadoop2.7/nofel_test/)

> run [info] Compiling 1 Scala source to /usr/local/spark-2.2.1-bin-hadoop2.7/nofel_test/target/scala-2.9.1/classes...

[error] class file needed by Graph is missing.

[error] reference type ClassTag of package reflect refers to nonexisting symbol.

[error] one error found

[error] {file:/usr/local/spark-2.2.1-bin-hadoop2.7/nofel_test/}default-b08e19/compile:compile: Compilation failed

[error] Total time: 2 s, completed Mar 26, 2018 11:14:28 PM


Solution

  • I added one line in the build file and it worked. If anyone knows the reason why this line (scalaVersion) was necessary, please let me know.

    name := "PageRank"
    
    version := "1.0"
    
    scalaVersion := "2.11.8"
    
    libraryDependencies ++= Seq(
         "org.apache.spark" % "spark-core_2.11" % "2.2.1" % "provided",
         "org.apache.spark" % "spark-sql_2.11" % "2.2.1" % "provided",
         "org.apache.spark" % "spark-graphx_2.11" % "2.2.1" % "provided"
     )