apache-sparkgoogle-cloud-platformgoogle-cloud-dataprocspark-graphx

How can I submit a Spark Graphx job example on Google Cloud Platform?


I created a cluster on Google Cloud Platform having five linux based virtual machines (VM): one master and 4 workers. I ran ./start-master.sh on the master VM and ./start-worker.sh [external-master-IP:7077] on the worker VMs.

Now I want to simply run a Graphx example job, for example a PageRank algorithm that is already in Spark, using ./bin/spark-submit.

I know, I read the documentation, which says to run like this:

./bin/spark-submit \
   --class <main-class> \
   --master <master-url> \
   --deploy-mode <deploy-mode> \
   --conf <key>=<value> \
   ... # other options
   <application-jar> \
   [application-arguments]

And I tried this:

./bin/spark-submit \
  --class org.apache.spark.examples.graphx.PageRankExample \
  --master spark://<external-IP>:7077 \
  --deploy-mode cluster

And it says:

" Error: Missing application resource. "

Do I need to add a .jar? I can't find it for this PageRank example.

Thank you.


Solution

  • Yes, you need to add the jar in the spark-submit command :

    ./bin/spark-submit \
      --class org.apache.spark.examples.graphx.PageRankExample \
      --master spark://<external-IP>:7077 \
      --deploy-mode cluster
      ../examples/jars/spark-examples_[your version].jar
    

    You should find it in the examples/jars folder under the spark installation. The jar is named spark-examples_*.jar:

    enter image description here