I created a cluster on Google Cloud Platform having five linux based virtual machines (VM): one master and 4 workers.
I ran ./start-master.sh
on the master VM and ./start-worker.sh [external-master-IP:7077]
on the worker VMs.
Now I want to simply run a Graphx example job, for example a PageRank algorithm that is already in Spark, using ./bin/spark-submit.
I know, I read the documentation, which says to run like this:
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
And I tried this:
./bin/spark-submit \
--class org.apache.spark.examples.graphx.PageRankExample \
--master spark://<external-IP>:7077 \
--deploy-mode cluster
And it says:
" Error: Missing application resource. "
Do I need to add a .jar? I can't find it for this PageRank example.
Thank you.
Yes, you need to add the jar in the spark-submit command :
./bin/spark-submit \
--class org.apache.spark.examples.graphx.PageRankExample \
--master spark://<external-IP>:7077 \
--deploy-mode cluster
../examples/jars/spark-examples_[your version].jar
You should find it in the examples/jars
folder under the spark installation. The jar is named spark-examples_*.jar
: