I'm trying to set up a Dockerised version of Spark and Zeppelin but I cannot seem to understand how to switch the Zeppelin version to the 3.x version of Spark.
I'm using the default Zeppelin image from Docker Hub. Here's an excerpt from my docker-compose.yml.
zeppelin:
image: apache/zeppelin:0.9.0
container_name: zeppelin
#depends_on:
# - spark-master
ports:
- "8083:8080"
If I access Zeppelin (at localhost:8083), and execute spark.version
, it still reads the version as 2.4.5.
How do I change the spark version in Zeppelin? I can see a fair number of versions supported but the docs don't clarify how to switch versions.https://github.com/apache/zeppelin/blob/master/spark/spark-shims/src/main/scala/org/apache/zeppelin/spark/SparkVersion.java#L25
You can run spark in a separate container and point the spark master to it, Another easy way is to build your image with a Spark
on top of Zeppelin
Create a Dockerfile file with zeppelin as the base image
FROM apache/zeppelin:0.9.0
ENV SPARK_VERSION=3.0.0
ENV HADOOP_VERSION=3.2
ENV SPARK_INSTALL_ROOT=/spark
ENV SPARK_HOME=${SPARK_INSTALL_ROOT}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}
USER root
RUN mkdir "${SPARK_INSTALL_ROOT}"
USER $USER
RUN cd "${SPARK_INSTALL_ROOT}" && \
wget --show-progress https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop${HADOOP_VERSION}.tgz && \
tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
Now Use this image to run with docker-compose, You can create the image with a tag and use it here or you can directly refer to Dockerfile as below version: '2'
services:
zeppelin:
image: zeppelin-spark
build:
context: .
dockerfile: Dockerfile
container_name: zeppelin
ports:
- "8083:8080"
Now run docker-compose up -d
make sure both files are in the same directory or feel free to adjust the path in the context
Then you see the version as 3.0.0