apache-sparkkubernetespysparkpython-3.12kubernetes-rbac

apache spark-submit to local k8s cluster fails despite having the necessary edit rbac setup


I have created the necessary serviceaccount and clusterrolebinding with edit clusterrole but still fails to submit to a local k8s cluster:

$ spark-submit --master k8s://https://127.0.0.1:16443 --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:python3 --conf spark.kubernetes.file.upload.path=/tmp --conf spark.kubernetes.authenticate.driver.serviceAccountName=sa-apache-spark --deploy-mode cluster TextFile.py 
25/03/07 16:20:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/03/07 16:20:57 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
25/03/07 16:20:58 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
25/03/07 16:20:58 INFO KubernetesUtils: Uploading file: /usr/src/Python/PySpark/TextFile.py to dest: /tmp/spark-upload-a551a9f4-6433-4674-bdc2-d117401c50ee/TextFile.py...
25/03/07 16:21:39 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
    at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1108)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:92)
    at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6(KubernetesClientApplication.scala:256)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$6$adapted(KubernetesClientApplication.scala:250)
    at org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
    at org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:94)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:250)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:223)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: timeout
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:340)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:703)
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:92)
    at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
    ... 17 more
Caused by: java.io.InterruptedIOException: timeout
    at okhttp3.RealCall.timeoutExit(RealCall.java:108)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:205)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.SocketException: Socket closed
    at java.base/sun.nio.ch.NioSocketImpl.endConnect(NioSocketImpl.java:531)
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:604)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
    at java.base/java.net.Socket.connect(Socket.java:751)
    at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
    at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
    at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
    ... 4 more
25/03/07 16:21:39 INFO ShutdownHookManager: Shutdown hook called
25/03/07 16:21:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-36873a8d-eb37-43e5-bacc-032e639773eb
$ kubectl auth can-i create pod
yes

Solution

  • I faced the same issue when I was trying to do a spark-submit to the local cluster from outside. However using the below command helped me.

    kubectl proxy
    

    The above command will create an authenticating proxy, to communicate to the Kubernetes API.
    Local proxy will be running at localhost:8001, --master k8s://http://127.0.0.1:8001 can be used as the argument to spark-submit instead of --master k8s://https://127.0.0.1:6443

    Full example command:

    $SPARK_HOME/bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=ghcr.io/dulshanr/spark-py:1.0 \
    --conf spark.kubernetes.namespace=spark \
    local:///opt/spark/examples/jars/spark-examples_2.13-4.0.0.jar
    

    Source :

    https://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode

    I believe the rbac resources you created originally will not get utilized properly unless you create the proxy.