pythongoogle-cloud-platformairflowspark-submit

Error while try getting spark connection id in Airflow


My SparkSubmitOperator in airflow DAG looks as below, below is my connection_id 'spark_local' in Airflow UI as Apache Spark Connection when I try running my DAG am getting this error Airflow error, can anyone please help me if am missing anything here?

SparkSubmitOperator (task_id='spark_task', application = 'gs://xxx/xxx.jar', conf = {"spark.driver.allowMultipleContexts":True, "spark.blacklist.enabled":False}, conn_id='spark_local', java_class = 'xxx', jars=["gs://xxx/*"], application_args=["xxx "xxx" "xxx,] )


Solution

  • Based on the TypeError , from the documentation https://airflow.apache.org/docs/apache-airflow/1.10.12/_api/airflow/contrib/operators/spark_submit_operator/index.html, it said that jars argument should be a string. Could you try jars="gs://xxx/*" as a string instead to see if it's working?