scalaapache-sparksbtdatabricksdatabricks-connect

How can I build a Scala Project with Databricks Connect in Visual Studio Code?


I am currently connecting my Visual Studio Code to my Databricks Workspace using the Databricks Connect feature (local machine is Windows). To do so, I followed instructions here and here. Now, I got it to work for PySpark. Meaning that I established the connection and I can execute some PySpark Code against my Cluster:

PySpark Databricks Connect

I would like to repeat the same small example using scala code. But I do not know how? The Databricks documentation is not exhaustive and my build.sbt fails. The build from this tutorial fails for me as well. Following the documentation I have created a build.sbt which looks as follows:

name := "scala_test"
version := "1.0"
scalaVersion := "2.12"

// this should be set to the path returned by ``databricks-connect get-jar-dir``
unmanagedBase := new java.io.File("C:/Users/user/Anaconda3/envs/databricksEnv/lib/site- 
packages/pyspark/jars")
mainClass := Some("com.example.Test")

I adjusted the build from the documentation to my scala version and adapted the file path. However, the build fails with the following error:

2022.02.07 11:27:34 ERROR sbt command failed: C:\Program Files\Eclipse Adoptium\jdk-8.0.322.6-hotspot\jre\bin\java -Djline.terminal=jline.UnsupportedTerminal -Dsbt.log.noformat=true -Dfile.encoding=UTF-8 -jar 

Note that I am new to scala and not entirely familiar with builds etc. Hence I struggle with the debugging of this issue. Here the full output log for the scala build on terminal:

Scala Databricks Connect

I am in general a little confused how Databricks Connect works but would be super happy to get it running :)


Solution

  • Ok, actually this was simply because I was not providing the right mainClass in the build.sbt. For future reference also, really make sure you are using the right jdk version as of the time of this answer only jdk 8 is supported. PySpark will compile with JDK 11 but Scala will (obviously) not.