scalaapache-sparkintellij-ideaapache-spark-sqlsbt

IntelliJ: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/types/DataType


PS. There's a similar question here, but that is in mvn and my project is in sbt.

First up, a few required informations:

I'm trying to run this project inside IntelliJ IDEA, for which my build.sbt looks like:

name := "kafka-latest-spark-streaming"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided",
    "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0" % "provided",
    "org.apache.kafka" % "kafka-clients" % "0.11.0.1"
)

The main application code is similar to the one in the tutorial apart from a few imports I had to make to make implicits like $ work. When I try to run the scala file by right click and selecting Run 'Main', it throws the below error:

/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/bin/java "-javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=59919:/Applications/IntelliJ IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/lib/tools.jar:/Users/sparker0i/kafka-latest-spark-streaming/target/scala-2.11/classes:/Users/sparker0i/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.11.12.jar:/Users/sparker0i/.ivy2/cache/net.jpountz.lz4/lz4/jars/lz4-1.3.0.jar:/Users/sparker0i/.ivy2/cache/org.apache.kafka/kafka-clients/jars/kafka-clients-0.11.0.1.jar:/Users/sparker0i/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.7.25.jar:/Users/sparker0i/.ivy2/cache/org.xerial.snappy/snappy-java/bundles/snappy-java-1.1.2.6.jar Main
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/types/DataType
    at Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.types.DataType
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 1 more

Solution

  • If you're trying to run spark as a local app from your IDE (master set as local[n]), you need to remove provided from your dependency definitions.

    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-sql" % "2.4.0", //provided removed
        "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.0", //provided removed
        "org.apache.kafka" % "kafka-clients" % "0.11.0.1"
    )
    

    On the other hand, when you'll be running your app on spark cluster you'll need to set spark dependencies as provided. You can just override dependencies with provided versions on certain tasks (like assembly).

    Another thing you can do is just clicking on the checkbox Include dependencies with "Provided" scope on run configuration and whenever you would run your project from IntelliJ all provided dependencies should be included.

    enter image description here