apache-sparkapache-kafkaclouderacloudera-cdh

Spark Streaming with Spark 2 and Kafka 2.1


I'm upgrading a Java project from Cloudera 5.10 to Cloudera 6.2. We have Spark Streaming reading data from Kafka to process it and write the results elsewhere. During the upgrade, Spark is going from v1.6 to v2.1, and Kafka from v0.8 to v2.1.

To perform the streaming processing, we were connecting to Kafka using KafkaUtils.createStream(...), but KafkaUtils are not available in Kafka 2.11 anymore. However, I can't seem to find any Spark Streaming + Kafka example or documentation which doesn't use this method in Java.

Is there something I'm missing? What is the best way to connect both worlds in these versions?


Solution

  • The module was renamed to spark-streaming-kafka-0-10

    https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10

    However, you should consider using Structured Streaming, instead.