apache-kafkaapache-storm

What capabilities does Apache Storm offer that are not now covered by Kafka Streaming?


I'm very naive about data engineering but it seems to me that a popular pipeline for data used to be Kafka to Storm to something.... but as I understand it Kafka now seems to have data processing capabilities that may often render Storm unnecessary. So my question is simply, in what scenarios might this be true that Kafka can do it all, and in what scenarios might Storm still be useful?

EDIT: Question was flagged for "opinion based".

This question tries to understand what capabilities Apache Storm offers that Apache Kafka Streaming does not (now that Kafka Streaming exists). The accepted answer touches on that. No opinions are requested by this question nor are they necessary to address the question. Question title edited to seem more objective.


Solution

  • You still need to deploy the Kafka code somewhere, e.g. YARN if using Storm.
    Plus, Kafka Streams can only process between the same Kafka cluster; Storm has other spouts and bolts. But Kafka Connect is one alternative to that.

    Kafka has no external dependency of a cluster scheduler, and while you may deploy Kafka clients in almost any popular programming language, it still requires external instrumentation, whether that's a Docker container or deployed on bare-metal.

    If anything, I'd say Heron or Flink are true comparative replacements for Storm