I want to use Spark DRA (Dynamic Resource Allocation) feature, so that the executors can be requested/released dynamically based on my application workload to improve resource utilization. But I wonder whether I must enable the spark external shuffle service to use the DRA (that is, whether DRA depends on spark external service to work).
In my opinion, DRA should depend on external shuffle service to work well. So that it can serve the released executor's shuffle data to other executors once the executor is released and gone.
Is my understanding correct?
Broadly speaking, you are right -- there should be some persistence mechanism to make dynamic allocation work. But in the narrower context of your question, I would go with a firm'ish NO because modern Spark versions provide other means to persist and serve shuffle blocks beyond External Shuffle Service (ESS). This is stated, clear and concise, in Spark Config:
Property Name
spark.dynamicAllocation.enabled
Defaultfalse
Meaning Whether to use dynamic resource allocation...
This requires one of the following conditions:
1) enabling external shuffle service throughspark.shuffle.service.enabled
, or
2) enabling shuffle tracking throughspark.dynamicAllocation.shuffleTracking.enabled
, or
3) enabling shuffle blocks decommission throughspark.decommission.enabled
andspark.storage.decommission.shuffleBlocks.enabled
, or
4) (Experimental) configuringspark.shuffle.sort.io.plugin.class
to use a customShuffleDataIO
who'sShuffleDriverComponents
supports reliable storage.
:
"No" is especially true for other Resource Managers than YARN, Kubernetes for example (which does not provide external shuffle service at all at the moment). The "-ish" in NO is because YARN still owns the majority, and requires that service for dynamic allocation.