While doing spark-submit
, Gcloud
gives an option to use --properties-file
to pass the cluster properties and spark configurations. I am not sure how to use it while running the job.
Create a .txt file with any name. The content inside this file is line separated as shown below.
spark.hadoop.hive.metastore.uris=ip1,ip2,ip3
spark.submit.deployMode=cluster
spark.yarn.appMasterEnv.PYTHONPATH=some_path
spark.executorEnv.PYTHONPATH=some_path
In your spark-submit pass this .txt file as below:
gcloud dataproc jobs submit pyspark --cluster=<cluster_name> --region=<region_name> main.py --properties-file <path to .txt file>