I installed toree
with pip and unzipped the Spark binaries to
/home/ebe/.bin/spark-2.3.0-bin-hadoop2.7
The above path is stored in the environment variable called $SPARK_HOME
. The following command is executed to install the Jupyter kernel
jupyter toree install --spark_home=$SPARK_HOME/ --user
When I start the Jupyter Notebook (or Jupyter Lab) and open a new Apache Spark Scala notebook, the kernel doesn't seem to activate. The following error messages pop up in the console.
[I 10:56:44.388 LabApp] Creating new notebook in /
[I 10:56:44.873 LabApp] Kernel started: f65565b1-3570-48a2-be7e-2756a058e156
Starting Spark Kernel with SPARK_HOME=/home/ebe/.bin/spark-2.3.0-bin-hadoop2.7/
2018-06-01 10:56:45 WARN Utils:66 - Your hostname, Jackdaw resolves to a loopback address: 127.0.1.1; using 192.168.1.247 instead (on interface eno1)
2018-06-01 10:56:45 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-06-01 10:56:46 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-06-01 10:56:46 INFO Main$$anon$1:161 - Kernel version: 0.1.0-incubating
2018-06-01 10:56:46 INFO Main$$anon$1:162 - Scala version: Some(2.10.4)
2018-06-01 10:56:46 INFO Main$$anon$1:163 - ZeroMQ (JeroMQ) version: 3.2.5
2018-06-01 10:56:46 INFO Main$$anon$1:70 - Initializing internal actor system
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:185)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:453)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:192)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:231)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:109)
at org.apache.toree.boot.layer.StandardBareInitialization$class.createActorSystem(BareInitialization.scala:71)
at org.apache.toree.Main$$anon$1.createActorSystem(Main.scala:34)
at org.apache.toree.boot.layer.StandardBareInitialization$class.initializeBare(BareInitialization.scala:60)
at org.apache.toree.Main$$anon$1.initializeBare(Main.scala:34)
at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:70)
at org.apache.toree.Main$delayedInit$body.apply(Main.scala:39)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at org.apache.toree.Main$.main(Main.scala:23)
at org.apache.toree.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[W 10:56:54.895 LabApp] Timeout waiting for kernel_info reply from f65565b1-3570-48a2-be7e-2756a058e156
Why the Scala version is different when the kernel tries to start (Scala version: Some(2.10.4)
) when the version of Scala in the Spark binaries is 2.11?
Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_172
Even the Scala version in the console is up to date.
$ scala -version
Scala code runner version 2.12.5-20180321-173609-unknown -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
I tried installing different builds of Toree and to arrive at the same problem.
How to solve this?
OS: Manjaro Linux.
This has been a huge pain for me as well. The issue seems to be that the latest released version of Toree does not yet support spark 2.x.
The solution is to install it from source. This gist walks you through the steps to install it on ubuntu: https://gist.github.com/mikecroucher/b57a9e5a4c1a1a2045f30a901b186bdf
The short version is:
Install sbt: https://www.scala-sbt.org/1.0/docs/Setup.html
git clone https://github.com/apache/incubator-toree
cd incubator-toree/
make dist
make release
ignore the following error if you get it:
/bin/sh: 1: docker: not found
Makefile:212: recipe for target 'dist/toree-pip/toree-0.2.0.dev1.tar.gz' failed
make: *** [dist/toree-pip/toree-0.2.0.dev1.tar.gz] Error 127
then:
cd dist/toree-pip/
python setup.py install
finally you are ready to install toree:
jupyter toree install --kernel_name=bespoke_spark --spark_home=/path/to/spark --user
As a bonus, remember to add:
spark.sql.catalogImplementation hive
To your spark default configurations so that you can connect to hive (if you need it).