To understand what is kafka-streams
I should know what is stream-processing
. When I start reading about them online I am not able to grasp an overall picture, because it is a never ending tree of links to new concepts.
Can any one explain what is stream-processing
with a simple real-world example?
And how to relate it to kafka-streams
with producer consumer architecture?
Thank you.
Stream Processing is based on the fundamental concept of unbounded streams of events (in contrast to static sets of bounded data as we typically find in relational databases).
Taking that unbounded stream of events, we often want to do something with it. An unbounded stream of events could be temperature readings from a sensor, network data from a router, order from an e-commerce system, and so on.
Let's imagine we want to take this unbounded stream of events, perhaps its manufacturing events from a factory about 'widgets' being manufactured.
We want to filter that stream based on a characteristic of the 'widget', and if it's red
route it to another stream. Maybe that stream we'll use for reporting, or driving another application that needs to respond to only red widgets
events:
This, in a rather crude nutshell, is stream processing. Stream processing is used to do things like:
As you mentioned, there are a large number of articles about this; without wanting to give you yet another link to follow, I would recommend this one.
Kafka Streams a stream processing library, provided as part of Apache Kafka. You use it in your Java applications to do stream processing.
In the context of the above example it looks like this:
Kafka Streams is built on top of the Kafka producer/consumer API, and abstracts away some of the low-level complexities. You can learn more about it in the documentation.