I don't know if this happening because Scala is so version restrictive or because all libraries are deprecated and not updated.
I have a little project in Scala Play with Apache Spark. I want and I like to use latest versions of the libraries, so I started the project so:
Scala v2.12.2
Play Framework v2.8.2
Apache Spark v3.0.0
I need to read csv, process it and insert into Impala Kudu database. Using jdbc connection and inserting data using prepared statements with query is not an improvement because I don't use Apache Spark at his max power (use it just for read file).
So, I heard about KuduContext
. I tried to install it, but surprise. KuduContext works only with Scala v2.11
and Apache Spark v2.4.6
(nothing about Play).
I uninstalled spark v3, download, install and set environments for Spark v2.4.6 again. Created new project with these configurations
Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
I found something incompatible with Play and downgrade it to 2.7. Later, I found some incompatibilities with Jackson module.
java.lang.ExceptionInInitializerError
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.10-1
Required to install "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.6.5"
. Now, when I start the project, when use SparkContext I gen another error
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
...
finally, my build.sbt became:
Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
jackson-module-scala v2.6.5
Some code:
object SparkContext {
val spark = SparkSession
.builder
.appName("SparkApp")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.getOrCreate()
val context = spark.sparkContext
}
SparkContext using here:
val df = SparkContext.spark.read.csv(filePath) // here the error ocurred
val lines = df.take(1001).map(mapper)
Its so heavy to take care about compatibilities with another libraries when create a new library version in this ecosystem? I found a lot of posts created about versions incompatibilities, but not a solution. What I miss here? thanks
Damn, I found the solution:
libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"
Besides jackson-module, I need to install jackson-databind.
My build.sbt
became:
scalaVersion := "2.11.11"
val sparkVersion = "2.4.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.kudu" %% "kudu-spark2" % "1.12.0"
)
libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"
// plugins.sbt
play version "2.7.5"
I really hope this will help somebody else which need to use these libraries togheter and who found issues like mine. I spent 3 hours to find a solution for a "simple" project configuration.