palantir-foundryfoundry-code-repositoriesfoundry-python-transform

When would I prefer to run a job in static allocation vs. dynamic allocation?


I've read the docs in Foundry for what the differences are between the two, but I'm wondering in what circumstances I would want to apply the STATIC_ALLOCATION profile to my build to prevent my executors from being preempted.

Are there any other things I should watch out for when running in dynamic allocation mode?


Solution

  • There's several things you should watch out for when running in dynamic allocation:

    1. Expensive tasks can be restarted when your executors are preempted, leading to poor performance
      • If you know you'll have expensive tasks, you may consider applying the STATIC_ALLOCATION profile to your build so that you can't be preempted. Be careful to not request more executors than strictly necessary for your build since no others will be able to share the resources.
    2. Expensive tasks can be estimated incorrectly by AQE, leading to even worse behavior when preempted
      • If you know you'll have expensive tasks, you may consider applying the ADAPTIVE_DISABLED profile to your build so that AQE won't get its estimates of partition sizes wrong, leading to poor parallelism. AQE is fantastic for SQL-like operations such as join, window, and other similar things, but when you use .udf or other specific behavior that AQE can't accurately estimate the expense of, it may end up hurting you in practice.
      • If you end up disabling AQE, you'll want to only do so for the builds where it is strictly necessary, i.e. many uses of .udf or other manual behavior. It'll help you in all other builds to leave it on.
      • When you disable AQE, you'll also want to ensure you manually size the partitions for your expensive stages, i.e. in the worst possible case you count the rows in your input to a .udf and repartition() into this count of rows, leading to 1 row per task. This is the maximum possible I/O cost to pay, but it will lead to the smallest possible tasks and most parallelism