I’m trying to deploy a SparkApplication using the Kubernetes Spark Operator. I built a custom Docker image for my Spark job, and I’m encountering an issue where the driver pod cannot find the JAR file that’s supposed to be included in the image.
Here’s my Dockerfile:
FROM bitnami/spark:3.5.3
WORKDIR /opt/spark/work-dir
COPY target/scala-2.12/app.jar /opt/spark/work-dir/
USER root
RUN chmod 777 /opt/spark/work-dir/app.jar
EXPOSE 8080
I built and pushed the image using these commands:
docker buildx build --platform=linux/amd64 -t repo/image:TAG .
docker push repo/image:TAG
When I inspect the image locally using:
docker run --rm -it repo/image:TAG /bin/bash
I can see that the JAR file exists in the expected directory:
-> pwd
/opt/spark/work-dir
-> ls
app.jar
Next, I deploy the Spark application using this YAML file:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: $APP
namespace: $NAMESPACE
spec:
type: Scala
mode: cluster
image: repo/image:TAG
imagePullPolicy: Always
mainClass: com.org.app.api.Api
mainApplicationFile: "local:///opt/spark/work-dir/app.jar"
sparkVersion: "3.5.3"
driver:
cores: 2
memory: "2G"
serviceAccount: spark
executor:
cores: 4
instances: 2
memory: "4G"
sparkConf:
"spark.kubernetes.container.image.pullPolicy": "Always"
"spark.kubernetes.namespace": "namespace-name"
However, when I describe the driver pod or check its logs, I see the following error:
Files local:///opt/spark/work-dir/app.jar from /opt/spark/work-dir/app.jar to /opt/spark/work-dir/app.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/app.jar
Things I've tried
I resolved the issue by changing the directory where I copy the Spark application JAR file to /opt/bitnami/spark/examples/jars/
.
Here’s the updated Dockerfile:
FROM bitnami/spark:3.5.3
COPY target/scala-2.12/qupid-deequ-assembly-0.1.0-SNAPSHOT.jar /opt/bitnami/spark/examples/jars/
USER root
RUN chmod -R 777 /opt
EXPOSE 8080
It appears that the Spark Operator or the Bitnami Spark image has a specific default configuration or expected location for application JAR files, which is /opt/bitnami/spark/examples/jars/
. After making this change, the driver was able to locate the JAR file without any issues.
If anyone has insights into why this particular directory is required or documented as a default for Spark applications in this image, I’d be interested to learn more!