scalaapache-sparksbtlift-json

spark-core 1.6.1 & lift-json 2.6.3 java.lang.NoClassDefFoundError


I have a Spark application which has a sbt file just like below.
It works on my local machine. But when I submit it to EMR running Spark 1.6.1, an error occured like below:

java.lang.NoClassDefFoundError: net/liftweb/json/JsonAST$JValue

I am using "sbt-package" to get jar

Build.sbt:

organization := "com.foo"
name := "FooReport"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.1"
  ,"net.liftweb" % "lift-json_2.10" % "2.6.3"
  ,"joda-time" % "joda-time" % "2.9.4"
)

Do you have any idea about what’s happening?


Solution

  • I' ve found a solution and it is working!

    The problem was all about sbt package which doesn't include all dependent jars to output jar. To overcome this I tried sbt-assembly but I got plenty of "deduplicate" errors when I ran it.

    After all I came up to this blog post which made everthing clear.
    http://queirozf.com/entries/creating-scala-fat-jars-for-spark-on-sbt-with-sbt-assembly-plugin

    In order to submit Spark jobs to a Spark Cluster (via spark-submit), you need to include all dependencies (other than Spark itself) in the Jar, otherwise you won't be able to use those in your job.

    1. Create "assembly.sbt" under /project folder.
    2. Add this line addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
    3. Then paste the assemblyMergeStrategy code below to your build.sbt

    assemblyMergeStrategy in assembly := { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last case PathList("javax", "activation", xs @ _*) => MergeStrategy.last case PathList("org", "apache", xs @ _*) => MergeStrategy.last case PathList("com", "google", xs @ _*) => MergeStrategy.last case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last case PathList("com", "codahale", xs @ _*) => MergeStrategy.last case PathList("com", "yammer", xs @ _*) => MergeStrategy.last case "about.html" => MergeStrategy.rename case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last case "META-INF/mailcap" => MergeStrategy.last case "META-INF/mimetypes.default" => MergeStrategy.last case "plugin.properties" => MergeStrategy.last case "log4j.properties" => MergeStrategy.last case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) }

    And run sbt assembly

    Now you have a big fat jar which has all the dependencies. It might be hundreds of MB based on dependent libraries. For my case I am using Aws EMR which Spark 1.6.1 is already installed on it. To exclude spark-core lib from your jar you can use "provided" keyword:

    "org.apache.spark" %% "spark-core" % "1.6.1" % "provided"
    

    Here is the final build.sbt file:

    organization := "com.foo"
    name := "FooReport"
    
    version := "1.0"
    
    scalaVersion := "2.10.6"
    
    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % "1.6.1" % "provided"
      ,"net.liftweb" % "lift-json_2.10" % "2.6.3"
      ,"joda-time" % "joda-time" % "2.9.4"
    )
    
    assemblyMergeStrategy in assembly := {
      case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
      case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
      case PathList("org", "apache", xs @ _*) => MergeStrategy.last
      case PathList("com", "google", xs @ _*) => MergeStrategy.last
      case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
      case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
      case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
      case "about.html" => MergeStrategy.rename
      case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
      case "META-INF/mailcap" => MergeStrategy.last
      case "META-INF/mimetypes.default" => MergeStrategy.last
      case "plugin.properties" => MergeStrategy.last
      case "log4j.properties" => MergeStrategy.last
      case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
    }