I'm spinning up a proof of concept with spark standalone on kubernetes with a jupyterhub.
I want dynamic allocation, as my users will frequently walk away from the keyboard with their Application (notebook) in 'running' state (but no tasks/jobs), but driver is sitting waiting for work.
Dynamic Allocation does not seem to kick in. From the documentation it is waiting for:
spark.dynamicAllocation.executorIdleTimeout=60s
but what is the definition of idle? to me it would seem these notebooks are idle
There are few reasons for that,
If your users are setting the number of executors, this number of executors will never be removed. What you need to set for the minimum executors should be the: spark.dynamicAllocation.minExecutors
In my case we set to 2, that allows the data scientists to stay with a min of executors for the job even if the cluster is full.
So first check if the option '--num-executors' is removed and change for spark.dynamicAllocation.minExecutors
.
Other reason for the workers are not been removed is about Cached data, if your Data Scientist has a cached data, for this issue check the option spark.dynamicAllocation.cachedExecutorIdleTimeout
this for my usecase we didn't change. For this reason according documentation that says:
default executors containing cached data are never removed
Change the option for spark.dynamicAllocation.cachedExecutorIdleTimeout
For more details about dynamic allocation see this presentation in Spark Summit Europe 2016