twitterspark-streamingapache-bahirspark-shell

Apache Spark 2.3.1 - twitter is not a member of package org.apache.spark.streaming


First of all I have been looking around for this problem a while now, and I can see there exist other solutions regarding this, however nothing for the Apache Spark version 2.3.1.

To be short, I am trying to create an application that uses bahir to perform analytics twitter messages in spark.

However, I am using Apache Spark version 2.3.1, so I found 2.3.0-SNAPSHOT

But when I try to use this bin/spark-shell --packages org.apache.bahir:spark-streaming-twitter_2.11:2.3.0-SNAPSHOT, it can't be found, from my local spark-shell:

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.bahir#spark-streaming-twitter_2.11;2.3.0-SNAPSHOT: not found]

I may be a stupid assumption but I figured maybe that the 2.3.0 would work for 2.3.1.

I can fetch the 2.2.1 version and the twitter4j seems to work, however I still get problems with the actual streaming.twitter_ to work with Spark 2.3.1. From my local spark-shell:

scala> import org.apache.spark.streaming.twitter._ <console>:23: error: object twitter is not a member of package org.apache.spark.streaming import org.apache.spark.streaming.twitter._
                                     ^

Would be nice to know if somebody knows if it is usable with Apache Spark 2.3.1?

Or am I just simply forced to downgrade my Spark version to make it work?

I am doing this in a notebook called Zeppelin but I have also tried to do this outside of Zeppelin so seems not to have anything to do with the notebook I am using.

Thankful for any insights.


Solution

  • I faced the same issue. I can't lower the spark, because I need 2.3 with another helium.. So I'm going to try bahir's dependency

    <dependency>
        <groupId>org.apache.bahir</groupId>
        <artifactId>spark-streaming-twitter_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    

    I will let you know, if it works...

    Edit: It did work.

    %spark2.dep
    z.reset()
    z.addRepo("MavenCentral").url("https://mvnrepository.com/")
    z.load("org.apache.bahir:spark-streaming-twitter_2.11:2.3.0")
    

    It solves issue with non-existing new twitter libraries and also it solves the issue with missing classes from "old spark" in "new spark".

    I was able to run sum examples using javascript leaflet map and spark 2.3 sparkstreaming :)