I have a machine with 80 cores. I'd like to start a Spark server in standalone mode on this machine with 8 executors, each with 10 cores. But, when I try to start my second worker on the master, I get an error.
$ ./sbin/start-master.sh
Starting org.apache.spark.deploy.master.Master, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
Starting org.apache.spark.deploy.worker.Worker, logging to ...
$ ./sbin/start-slave.sh spark://localhost:7077 -c 10
org.apache.spark.deploy.worker.Worker running as process 64606. Stop it first.
In the documentation, it clearly states "you can start one or more workers and connect them to the master via: ./sbin/start-slave.sh <master-spark-URL>
". So why can't I do that?
A way to get the same parallelism is to start many workers.
You can do this by adding to the ./conf/spark-env.sh file:
SPARK_WORKER_INSTANCES=8
SPARK_WORKER_CORES=10
SPARK_EXECUTOR_CORES=10