apache-sparkapache-spark-standalone

What is the relationship between workers, worker instances, and executors?


In Spark Standalone mode, there are master and worker nodes.

Here are few questions:

  1. Does 2 worker instance mean one worker node with 2 worker processes?
  2. Does every worker instance hold an executor for specific application (which manages storage, task) or one worker node holds one executor?
  3. Is there a flow chart explaining how spark works on runtime, such as word count?

Solution

  • I suggest reading the Spark cluster docs first, but even more so this Cloudera blog post explaining these modes.

    Your first question depends on what you mean by 'instances'. A node is a machine, and there's not a good reason to run more than one worker per machine. So two worker nodes typically means two machines, each a Spark worker.

    Workers hold many executors, for many applications. One application has executors on many workers.

    Your third question is not clear.