pythonapache-sparkipythonpysparkapache-toree

Jupyter Notebook with Apache Spark (Kernel Error)


My objectif is to use Jupyter Notebook (IPython) with Apache Spark. I'm using Apache Toree to do this. I was setting environment variable for SPARK_HOME and configuring Apache Toree installation with Jupyter. Everything seems fine.

When I run the below command, a juypter browser is opened ipython notebook --profile=pyspark

enter image description here

When I choose Apache Toree - PySpark in the drop-down menu, I can't code in my notebook and I have this view (Python 2 is OK):

enter image description here

The red button gives :

enter image description here

What's wrong ? Help please ?


Solution

  • Not really an answer, but if you're not hooked on toree and just need a local spark for learning and experimenting, you could download a copy of spark, unzip it and use this in the beginning of your notebook:

    import os
    import sys
    
    os.environ['SPARK_HOME']="<path where you have extracted the spark file>"
    sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'python') )
    sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'bin') )
    sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'python/lib/py4j-0.10.4-src.zip') )
    
    from pyspark import SparkContext,SparkConf
    from pyspark.sql import SQLContext, Row
    import pyspark.sql.functions as sql
    
    
    sc = SparkContext()
    sqlContext = SQLContext(sc)
    print sc.version