apache-sparkjupyter-notebookapache-toree

install spark packages in toree


I usually start my spark-shell with:

./bin/spark-shell --packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1

I'm trying to use Apache Toree now, any idea of how should I load these libraries on the notebook?

I tried the following:

jupyter toree install --user --spark_home=/home/eron/spark-1.6.1/ --spark_opts="--packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1"

but that did not seem to work


Solution

  • You can specify packages in the SPARK_OPTS environment variable:

    export SPARK_OPTS='--packages com.databricks:spark-csv_2.10:1.4.0'
    

    Modifying spark-defaults.conf also works:

    echo spark.jars.packages=com.databricks:spark-csv_2.10:1.4.0 | sudo tee -a $SPARK_HOME/conf/spark-defaults.conf