scalaapache-sparkplayframeworkapache-kudu

Scala Play, Apache Spark and KuduContext incompatibilities


I don't know if this happening because Scala is so version restrictive or because all libraries are deprecated and not updated.

I have a little project in Scala Play with Apache Spark. I want and I like to use latest versions of the libraries, so I started the project so:

Scala v2.12.2
Play Framework v2.8.2
Apache Spark v3.0.0

I need to read csv, process it and insert into Impala Kudu database. Using jdbc connection and inserting data using prepared statements with query is not an improvement because I don't use Apache Spark at his max power (use it just for read file).

So, I heard about KuduContext. I tried to install it, but surprise. KuduContext works only with Scala v2.11 and Apache Spark v2.4.6 (nothing about Play).

I uninstalled spark v3, download, install and set environments for Spark v2.4.6 again. Created new project with these configurations

Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0

I found something incompatible with Play and downgrade it to 2.7. Later, I found some incompatibilities with Jackson module.

java.lang.ExceptionInInitializerError
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.10-1

Required to install "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.6.5". Now, when I start the project, when use SparkContext I gen another error

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$

...

finally, my build.sbt became:

Scala v2.11.11
Play Framework v2.8.2
Apache Spark v2.4.6
KuduSpark2 v1.12.0
jackson-module-scala v2.6.5

Some code:

object SparkContext {
  val spark = SparkSession
    .builder
    .appName("SparkApp")
    .master("local[*]")
    .config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
    .getOrCreate()

  val context = spark.sparkContext
}

SparkContext using here:

val df = SparkContext.spark.read.csv(filePath) // here the error ocurred
val lines = df.take(1001).map(mapper)

Its so heavy to take care about compatibilities with another libraries when create a new library version in this ecosystem? I found a lot of posts created about versions incompatibilities, but not a solution. What I miss here? thanks


Solution

  • Damn, I found the solution:

    libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
    libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"
    

    Besides jackson-module, I need to install jackson-databind.

    My build.sbt became:

    scalaVersion := "2.11.11"
    val sparkVersion = "2.4.6"
    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % sparkVersion,
      "org.apache.spark" %% "spark-sql" % sparkVersion,
      "org.apache.kudu" %% "kudu-spark2" % "1.12.0"
    )
    libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.11.1"
    libraryDependencies += "com.fasterxml.jackson.core" % "jackson-databind" % "2.11.1"
    
    // plugins.sbt
    play version "2.7.5"
    

    I really hope this will help somebody else which need to use these libraries togheter and who found issues like mine. I spent 3 hours to find a solution for a "simple" project configuration.