apache-sparkspark-shellscala-script

Running scala script with line breaks in spark-shell


I'm trying to run a scala script through spark shell using the following command: spark-shell -i myScriptFile.scala

I can get the above command to work when I have single-line commands, but if I have any line-breaks in the script (for readability), the spark-shell (or REPL?) interprets each of the lines as a full action. Here is a sample of my script:

import org.apache.spark.sql.types._
import java.util.Calendar
import java.text.SimpleDateFormat

// *********************** This is for Dev *********************** 
val dataRootPath = "/dev/test_data"
// *********************** End of DEV specific paths ***************

val format = new SimpleDateFormat("yyyy-MM-dd")
val currentDate = format.format(Calendar.getInstance().getTime()).toString

val cc_df = spark.read.parquet(s"${dataRootPath}/cc_txns")
    .filter($"TXN_DT" >= date_sub(lit(current_date), 365) && $"TXN_DT" < lit(current_date))
    .filter($"AMT" >= 0)

....

System.exit(0)

When running the spark-shell with this script, I get the following error:

<console>:1: error: illegal start of definition

The syntax for the script is correct because if I start the shell and manually paste this code in with :paste, everything works fine.

I have tried ending all multi-line commands with a backslash \ but that didn't work either.

Does anyone have any suggestions on how I can keep my script multi-lined but still be able to pass it the spark-shell as an argument to start with?


Solution

  • Try:

    val x = {  some statement ... 
             . some statement2 ... 
             . idem ditto 
             . ...
    }