scalaapache-sparkjupyter-notebookapache-toree

Apache Toree Spark kernel doesn't start (java.lang.NoSuchMethodError)


I installed toree with pip and unzipped the Spark binaries to

/home/ebe/.bin/spark-2.3.0-bin-hadoop2.7

The above path is stored in the environment variable called $SPARK_HOME. The following command is executed to install the Jupyter kernel

jupyter toree install --spark_home=$SPARK_HOME/ --user

When I start the Jupyter Notebook (or Jupyter Lab) and open a new Apache Spark Scala notebook, the kernel doesn't seem to activate. The following error messages pop up in the console.

[I 10:56:44.388 LabApp] Creating new notebook in /
[I 10:56:44.873 LabApp] Kernel started: f65565b1-3570-48a2-be7e-2756a058e156
Starting Spark Kernel with SPARK_HOME=/home/ebe/.bin/spark-2.3.0-bin-hadoop2.7/
2018-06-01 10:56:45 WARN  Utils:66 - Your hostname, Jackdaw resolves to a loopback address: 127.0.1.1; using 192.168.1.247 instead (on interface eno1)
2018-06-01 10:56:45 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-06-01 10:56:46 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-01 10:56:46 INFO  Main$$anon$1:161 - Kernel version: 0.1.0-incubating
2018-06-01 10:56:46 INFO  Main$$anon$1:162 - Scala version: Some(2.10.4)
2018-06-01 10:56:46 INFO  Main$$anon$1:163 - ZeroMQ (JeroMQ) version: 3.2.5
2018-06-01 10:56:46 INFO  Main$$anon$1:70 - Initializing internal actor system
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
    at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
    at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
    at akka.actor.RootActorPath.$div(ActorPath.scala:185)
    at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)
    at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:453)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
    at scala.util.Try$.apply(Try.scala:192)
    at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
    at scala.util.Success.flatMap(Try.scala:231)
    at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
    at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:109)
    at org.apache.toree.boot.layer.StandardBareInitialization$class.createActorSystem(BareInitialization.scala:71)
    at org.apache.toree.Main$$anon$1.createActorSystem(Main.scala:34)
    at org.apache.toree.boot.layer.StandardBareInitialization$class.initializeBare(BareInitialization.scala:60)
    at org.apache.toree.Main$$anon$1.initializeBare(Main.scala:34)
    at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:70)
    at org.apache.toree.Main$delayedInit$body.apply(Main.scala:39)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at org.apache.toree.Main$.main(Main.scala:23)
    at org.apache.toree.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[W 10:56:54.895 LabApp] Timeout waiting for kernel_info reply from f65565b1-3570-48a2-be7e-2756a058e156

Why the Scala version is different when the kernel tries to start (Scala version: Some(2.10.4)) when the version of Scala in the Spark binaries is 2.11?

Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_172

Even the Scala version in the console is up to date.

$ scala -version
Scala code runner version 2.12.5-20180321-173609-unknown -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

I tried installing different builds of Toree and to arrive at the same problem.

How to solve this?

OS: Manjaro Linux.


Solution

  • This has been a huge pain for me as well. The issue seems to be that the latest released version of Toree does not yet support spark 2.x.

    The solution is to install it from source. This gist walks you through the steps to install it on ubuntu: https://gist.github.com/mikecroucher/b57a9e5a4c1a1a2045f30a901b186bdf

    The short version is:

    Install sbt: https://www.scala-sbt.org/1.0/docs/Setup.html

    git clone https://github.com/apache/incubator-toree
    cd incubator-toree/
    make dist
    make release
    

    ignore the following error if you get it:

    /bin/sh: 1: docker: not found
    Makefile:212: recipe for target 'dist/toree-pip/toree-0.2.0.dev1.tar.gz' failed
    make: *** [dist/toree-pip/toree-0.2.0.dev1.tar.gz] Error 127
    

    then:

    cd dist/toree-pip/
    python setup.py install
    

    finally you are ready to install toree:

    jupyter toree install --kernel_name=bespoke_spark --spark_home=/path/to/spark  --user
    

    As a bonus, remember to add:

    spark.sql.catalogImplementation hive
    

    To your spark default configurations so that you can connect to hive (if you need it).