I'm running a python script in pyspark and got the following error: NameError: name 'spark' is not defined
I looked it up and found that the reason is that spark.dynamicAllocation.enabled
is not allowed yet.
According to Spark's documentation (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled): spark.dynamicAllocation.enabled
(default: false
) controls whether dynamic allocation is enabled or not. It is assumed that spark.executor.instances
is not set or is 0 (which is the default value).
Since the default setting is false
, I need to change the Spark setting to enable spark.dynamicAllocation.enabled
.
I installed Spark with brew, and didn't change its configuration/setting.
How can I change the setting and enable spark.dynamicAllocation.enabled
?
Thanks a lot.
There are several places you can set it. If you would like to enable it on a per job basis, set the following in each application:
conf.set("spark.dynamicAllocation.enabled","true")
If you want to set if for all jobs, navigate to the spark.conf file. In the Hortonworks distro it should be
/usr/hdp/current/spark-client/conf/
Add the setting to your spark-defaults.conf and should be good to go.