I have a cluster with two workers and one master.
To start master & workers I use the sbin/start-master.sh
and sbin/start-slaves.sh
in the master's machine. Then, the master UI shows me that the slaves are ALIVE (so, everything OK so far). Issue comes when I want to use spark-submit
.
I execute this command in my local machine:
spark-submit --master spark://<master-ip>:7077 --deploy-mode cluster /home/user/example.jar
But the following error pops up: ERROR ClientEndpoint: Exception from cluster was: java.nio.file.NoSuchFileException: /home/user/example.jar
I have been doing some research in stack overflow and Spark's documentation and it seems like I should specify the application-jar
of spark-submit
command as "Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes." (as it indicates https://spark.apache.org/docs/latest/submitting-applications.html).
My question is: how can I set my .jar as globally visible inside the cluster? There is a similar question in here Spark Standalone cluster cannot read the files in local filesystem but solutions do not work for me.
Also, am I doing something wrong by initialising the cluster inside my master's machine using sbin/start-master.sh
but then doing the spark-submit
in my local machine? I initialise the master inside my master's terminal because I read so in Spark's documentation, but maybe this has something to do with the issue. From Spark's documentation:
Once you’ve set up this file, you can launch or stop your cluster with the following shell scripts, based on Hadoop’s deploy scripts, and available in SPARK_HOME/sbin: [...] Note that these scripts must be executed on the machine you want to run the Spark master on, not your local machine.
Thank you very much
EDIT: I have copied the file .jar in every worker and it works. But my point is to know if there is a better way, since this method makes me copy the .jar to each worker everytime I create a new jar. (This was one of the answers from the question of the already posted link Spark Standalone cluster cannot read the files in local filesystem )
@meisan your spark-submit
command is missing out on 2 things.
--jar
Now you have not specified anywhere if you are using scala or python but in the nutshell your command will look something like:
for python :
spark-submit --master spark://<master>:7077 --deploy-mode cluster --jar <dependency-jars> <python-file-holding-driver-logic>
for scala:
spark-submit --master spark://<master>:7077 --deploy-mode cluster --class <scala-driver-class> --driver-class-path <application-jar> --jar <dependency-jars>
Also, spark takes care of sending the required files and jars to the executors when you use the documented flags.
If you want to omit the --driver-class-path
flag, you can set the environmental variable SPARK_CLASSPATH
to path where all your jars are placed.