scalaapache-sparkactivemq-classicapache-bahir

Unable to start Spark application with Bahir


I am trying to run a Spark application in Scala to connect to ActiveMQ. I am using Bahir for this purpose format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider"). When I am using Bahir2.2 in my built.sbt the application is running fine but on changing it to Bahir3.0 or Bahir4.0 the application is not starting and it is giving an error:

[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream`

How to fix this? Is there an alternative of Bahir which I can use in my Spark-Structured-Streaming to connect to ActiveMQ topics?

EDIT: my build.sbt

//For spark
libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.4.0" ,
    "org.apache.spark" %% "spark-mllib" % "2.4.0" ,
    "org.apache.spark" %% "spark-sql" % "2.4.0" ,
    "org.apache.spark" %% "spark-hive" % "2.4.0" ,
    "org.apache.spark" %% "spark-streaming" % "2.4.0" ,
    "org.apache.spark" %% "spark-graphx" % "2.4.0",
)

//Bahir
libraryDependencies += "org.apache.bahir" %% "spark-sql-streaming-mqtt" % "2.4.0"

Solution

  • Okay, So it seems some kind of compatibility issue between spark2.4 and bahir2.4. I fixed it by rolling back both of them to ver 2.3.

    Here is my build.sbt

    name := "sparkTest"
    
    version := "0.1"
    
    scalaVersion := "2.11.11"
    
    //For spark
    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-core" % "2.3.0" ,
        "org.apache.spark" %% "spark-mllib" % "2.3.0" ,
        "org.apache.spark" %% "spark-sql" % "2.3.0" ,
        "org.apache.spark" %% "spark-hive" % "2.3.0" ,
        "org.apache.spark" %% "spark-streaming" % "2.3.0" ,
        "org.apache.spark" %% "spark-graphx" % "2.3.0",
    //    "org.apache.spark" %% "spark-streaming-kafka" % "1.6.3",
    )
    
    //Bahir
    libraryDependencies += "org.apache.bahir" %% "spark-sql-streaming-mqtt" % "2.3.0"