apache-sparkgcloudspark-submitdataproc

How to use --properties-file flag in dataproc?


While doing spark-submit, Gcloud gives an option to use --properties-file to pass the cluster properties and spark configurations. I am not sure how to use it while running the job.


Solution

  • Create a .txt file with any name. The content inside this file is line separated as shown below.

    spark.hadoop.hive.metastore.uris=ip1,ip2,ip3
    spark.submit.deployMode=cluster
    spark.yarn.appMasterEnv.PYTHONPATH=some_path
    spark.executorEnv.PYTHONPATH=some_path
    

    In your spark-submit pass this .txt file as below:

    gcloud dataproc jobs submit pyspark --cluster=<cluster_name> --region=<region_name> main.py --properties-file <path to .txt file>