apache-sparksbtspark-submit

Difference in running a spark application with sbt run or with spark-submit script


I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:

  1. Package the project into a JAR file, and then run it with the spark-submit script.
  2. Running the project directly with sbt run.

I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.

Thanks!


Solution

  • SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.

    You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.

    You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).