apache-sparkkubernetesairflowrbac

Service account cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\"


I updated my Airflow from version 2.30 to 2.9.3. Now I'm trying to run a spark job and I'm getting this error (SparkPodOperator):

[2024-08-30, 13:57:04 UTC] {spark_kubernetes.py:282} INFO - Creating sparkApplication.
[2024-08-30, 13:57:04 UTC] {base.py:84} INFO - Using connection ID 'kubernetes_default' for task execution.
[2024-08-30, 13:57:04 UTC] {custom_object_launcher.py:312} ERROR - Exception when attempting to create spark job
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 300, in start_spark_job
    while self.spark_job_not_running(self.spark_obj_spec):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 319, in spark_job_not_running
    spark_job_info = self.custom_obj_api.get_namespaced_custom_object_status(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api/custom_objects_api.py", line 1927, in get_namespaced_custom_object_status
    return self.get_namespaced_custom_object_status_with_http_info(group, version, namespace, plural, name, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api/custom_objects_api.py", line 2034, in get_namespaced_custom_object_status_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 244, in GET
    return self.request("GET", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b822ab33-691b-49bf-bdfd-5e355c465c0e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '21960dd0-b9d3-43db-abbc-680540cb1fb8', 'X-Kubernetes-Pf-Prioritylevel-Uid': '2b2dc621-5a20-4eef-a225-8b780bc64442', 'Date': 'Fri, 30 Aug 2024 13:57:04 GMT', 'Content-Length': '487'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-job-test-12345\" is forbidden: User \"system:serviceaccount:airflow-v2:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow-v2\"","reason":"Forbidden","details":{"name":"spark-job-test-12345","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}

I already set my cluster role and cluster role binding to the airflow-worker service account:

Resources                                       Non-Resource URLs                     Resource Names     Verbs
sparkapplications.sparkoperator.k8s.io          []                                    []                 [*]

Anyone has any idea of what could be happening? Or how can I debug it?

This is my cluster role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-cluster-cr
  labels:
    rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
 - apiGroups:
 - sparkoperator.k8s.io
  resources:
 - sparkapplications
  verbs:
 - "*"

This is my role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: airflow-v2-spark-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: spark-cluster-cr
subjects:
- kind: ServiceAccount
  name: airflow-worker
  namespace: airflow-v2

Solution

  • So, I found out that you just need to explicitly include the status on the resources of the cluster role:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: spark-cluster-cr
      labels:
        rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
    rules:
     - apiGroups:
     - sparkoperator.k8s.io
      resources:
     - sparkapplications
     - sparkapplications/status
      verbs:
     - "*"