I was looking at Apache Toree to use as Pyspark kernel for Jupyter
https://github.com/apache/incubator-toree
However it was using older version of Spark (1.5.1 vs current 1.6.0). I tried to use this method here http://arnesund.com/2015/09/21/spark-cluster-on-openstack-with-multi-user-jupyter-notebook/ by creating kernel.js
{
"display_name": "PySpark",
"language": "python",
"argv": [
"/usr/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/Cellar/apache-spark/1.6.0/libexec",
"PYTHONPATH": "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/:/usr/local/Cellar/apache-spark/1.6.0/libexec/python/lib/py4j-0.9-src.zip",
"PYTHONSTARTUP": "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master local[*] pyspark-shell"
}
}
However, I got few problems:
There is no /jupyter/kernels
path in my Mac. So I ended up creating this path ~/.jupyter/kernels/pyspark
. I am not sure if that is the correct path.
Even after having all correct paths, I still don't see PySpark
showing up as a kernel inside Jupyter.
What did I miss?
Jupyter kernels should go in $JUPYTER_DATA_DIR. On OSX, this is ~/Library/Jupyter
. See: https://docs.jupyter.org/en/latest/use/jupyter-directories.html#data-files