pythonapache-sparkconfigurationpysparkdynamic-allocation

How to change Spark setting to allow spark.dynamicAllocation.enabled?


I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined

I looked it up and found that the reason is that spark.dynamicAllocation.enabled is not allowed yet.

According to Spark's documentation (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled): spark.dynamicAllocation.enabled (default: false) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances is not set or is 0 (which is the default value).

Since the default setting is false, I need to change the Spark setting to enable spark.dynamicAllocation.enabled.

I installed Spark with brew, and didn't change its configuration/setting.

How can I change the setting and enable spark.dynamicAllocation.enabled?

Thanks a lot.


Solution

  • There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:

    conf.set("spark.dynamicAllocation.enabled","true")
    

    If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be

    /usr/hdp/current/spark-client/conf/
    

    Add the setting to your spark-defaults.conf and should be good to go.