On my local server I have spark cluster running in standalone mode and I have a Spring Boot Spark job which I submit using following command:
spark-submit --conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///log4j2.xml -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -Dlog4j.debug" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///log4j2.xml --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m -Dlog4j.debug" --driver-java-options "-Xms4096m -XX:+UseG1GC -XX:G1HeapRegionSize=32M --add-opens=java.base/sun.nio.ch=ALL-UNNAMED" --master spark://localhost:7077 --deploy-mode cluster --num-executors 1 --executor-cores 4 --executor-memory 4096m --driver-memory 4096m --conf "spark.driver.memory=4096m" --conf "spark.dynamicAllocation.enabled=true" operatordc1-0.0.1-SNAPSHOT.jar
It runs well and does its job successfully. Then I created a Spark operator in my Kubernetes cluster using this image: ghcr.io/kubeflow/spark-operator:v1beta2-1.4.3-3.5.0. The operator was created successfully, and now I wanted to run the same job that I have run on my standalone Spark cluster on Kubernetes using the Spark operator. For that purpose I created this YAML file:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: operatordc1
namespace: spark-operator
spec:
type: Java
mode: cluster
image: "focode/spark-custom:release-1.0"
imagePullPolicy: Always
mainApplicationFile: "local:///opt/spark/examples/jars/operatordc1-0.0.1-SNAPSHOT.jar"
sparkVersion: "3.4.2"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1000m"
memory: "1024m"
javaOptions: >-
-Dlog4j.configuration=file:///log4j2.xml
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
-Xms1024m
-Dlog4j.debug
labels:
version: "3.4.2"
serviceAccount: default
executor:
cores: 4
instances: 1
memory: "1024m"
javaOptions: >-
-Dlog4j.configuration=file:///log4j2.xml
-XX:ReservedCodeCacheSize=100M
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=256m
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
-Dlog4j.debug
labels:
version: "3.4.2"
serviceAccount: default
sparkConf:
"spark.driver.userClassPathFirst": "true"
"spark.executor.userClassPathFirst": "true"
"spark.driver.memory": "1024m"
"spark.executor.memory": "1024m"
"spark.dynamicAllocation.enabled": "true"
I run it using this command:
kubectl apply -f spark-job-poc-1.yaml -n spark-operator
Unfortunately, this give me the following error:
failed to submit SparkApplication operatordc1: failed to run spark-submit for SparkApplication spark-operator/operatordc1: 24/04/27 10:19:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'No FileSystem for scheme "local"'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:1047) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:528) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This means that it is not able to find the main class of my Spring Boot job
I have uploaded the source code of my Spring Boot job at this location: https://github.com/focode/operatordc1
I have tried giving the main class using this key: mainClass: "com.dcpoc1.operator.operatordc1.Operatordc1Application"
along with mainApplicationFile
, but it didn't work for me with the same exception. What is also concerning to me is when I submit the job on my standalone Spark cluster using spark submit
command, I don't specify the class parameter, but the Kubernetes operator is forcing me to add the class parameter for my Spring Boot job.
I wanted resolution for the error which states that main class of the job is not found
I got the resolution , I used this as main class : mainClass: "org.springframework.boot.loader.JarLauncher"