I updated my Airflow from version 2.30 to 2.9.3. Now I'm trying to run a spark job and I'm getting this error (SparkPodOperator):
[2024-08-30, 13:57:04 UTC] {spark_kubernetes.py:282} INFO - Creating sparkApplication.
[2024-08-30, 13:57:04 UTC] {base.py:84} INFO - Using connection ID 'kubernetes_default' for task execution.
[2024-08-30, 13:57:04 UTC] {custom_object_launcher.py:312} ERROR - Exception when attempting to create spark job
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 300, in start_spark_job
while self.spark_job_not_running(self.spark_obj_spec):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 319, in spark_job_not_running
spark_job_info = self.custom_obj_api.get_namespaced_custom_object_status(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api/custom_objects_api.py", line 1927, in get_namespaced_custom_object_status
return self.get_namespaced_custom_object_status_with_http_info(group, version, namespace, plural, name, **kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api/custom_objects_api.py", line 2034, in get_namespaced_custom_object_status_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 244, in GET
return self.request("GET", url,
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 238, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b822ab33-691b-49bf-bdfd-5e355c465c0e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '21960dd0-b9d3-43db-abbc-680540cb1fb8', 'X-Kubernetes-Pf-Prioritylevel-Uid': '2b2dc621-5a20-4eef-a225-8b780bc64442', 'Date': 'Fri, 30 Aug 2024 13:57:04 GMT', 'Content-Length': '487'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-job-test-12345\" is forbidden: User \"system:serviceaccount:airflow-v2:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow-v2\"","reason":"Forbidden","details":{"name":"spark-job-test-12345","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}
I already set my cluster role and cluster role binding to the airflow-worker service account:
Resources Non-Resource URLs Resource Names Verbs
sparkapplications.sparkoperator.k8s.io [] [] [*]
Anyone has any idea of what could be happening? Or how can I debug it?
This is my cluster role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-cluster-cr
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- "*"
This is my role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: airflow-v2-spark-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: spark-cluster-cr
subjects:
- kind: ServiceAccount
name: airflow-worker
namespace: airflow-v2
So, I found out that you just need to explicitly include the status on the resources of the cluster role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-cluster-cr
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- sparkapplications/status
verbs:
- "*"