apache-sparkhadoophadoop-yarn

What is the difference between Driver and Application manager in spark


I couldn't figure out what is the difference between Spark driver and application master. Basically the responsibilities in running an application, who does what?

In client mode, client machine has the driver and app master runs in one of the cluster nodes. In cluster mode, client doesn't have any, driver and app master runs in same node (one of the cluster nodes).

What exactly are the operations that driver do and app master do?

References:


Solution

  • As per the spark documentation

    Spark Driver :

    The Driver(aka driver program) is responsible for converting a user application to smaller execution units called tasks and then schedules them to run with a cluster manager on executors. The driver is also responsible for executing the Spark application and returning the status/results to the user.

    Spark Driver contains various components – DAGScheduler, TaskScheduler, BackendScheduler and BlockManager. They are responsible for the translation of user code into actual Spark jobs executed on the cluster.

    Where in Application Master is

    The Application Master is responsible for the execution of a single application. It asks for containers from the Resource Scheduler (Resource Manager) and executes specific programs on the obtained containers. Application Master is just a broker that negotiates resources with the Resource Manager and then after getting some container it make sure to launch tasks(which are picked from scheduler queue) on containers.

    In a nutshell Driver program will translate your custom logic into stages, job and task.. and your application master will make sure to get enough resources from RM And also make sure to check the status of your tasks running in a container.

    as it is already said in your provided references the only different between client and cluster mode is

    In client, mode driver will run on the machine where we have executed/run spark application/job and AM runs in one of the cluster nodes.

    (AND)

    In cluster mode driver run inside application master, it means the application has much more responsibility.

    References :

    https://luminousmen.com/post/spark-anatomy-of-spark-application#:~:text=The%20Driver(aka%20driver%20program,status%2Fresults%20to%20the%20user.

    https://www.edureka.co/community/1043/difference-between-application-master-application-manager#:~:text=The%20Application%20Master%20is%20responsible,class)%20on%20the%20obtained%20containers.