apache-sparkspark-shuffle

Does Spark Dynamic Allocation depend on external shuffle service to work well?


I want to use Spark DRA (Dynamic Resource Allocation) feature, so that the executors can be requested/released dynamically based on my application workload to improve resource utilization. But I wonder whether I must enable the spark external shuffle service to use the DRA (that is, whether DRA depends on spark external service to work).

In my opinion, DRA should depend on external shuffle service to work well. So that it can serve the released executor's shuffle data to other executors once the executor is released and gone.

Is my understanding correct?


Solution

  • Broadly speaking, you are right -- there should be some persistence mechanism to make dynamic allocation work. But in the narrower context of your question, I would go with a firm'ish NO because modern Spark versions provide other means to persist and serve shuffle blocks beyond External Shuffle Service (ESS). This is stated, clear and concise, in Spark Config:

    Property Name spark.dynamicAllocation.enabled
    Default false
    Meaning Whether to use dynamic resource allocation...
    This requires one of the following conditions:
    1) enabling external shuffle service through spark.shuffle.service.enabled, or
    2) enabling shuffle tracking through spark.dynamicAllocation.shuffleTracking.enabled, or
    3) enabling shuffle blocks decommission through spark.decommission.enabled and spark.storage.decommission.shuffleBlocks.enabled, or
    4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
    :

    "No" is especially true for other Resource Managers than YARN, Kubernetes for example (which does not provide external shuffle service at all at the moment). The "-ish" in NO is because YARN still owns the majority, and requires that service for dynamic allocation.