apache-sparkpysparkazure-databricksdatabricks-workflows

Multiple parallel Databricks tasks in job


Suppose I have a job deployed in Databricks which has multiple parallel tasks with a single cluster attached for the run (1 driver, 4 workers). Below is a screengrab from the jobs UI as an example.

enter image description here

How does the execution take place in this case? Are there multiple spark session/spark context all using the same driver/workers nodes with divided resources or is something else happening? How are the resources utilized in such a scenario?


Solution

  • Therse should be single SparkSession / SparkContext per JVM and it should be shared across all tasks.

    Resources are also shared.The resource allocation depends on the configurations and the resource management policy you set. You can set limit to each task.