pythonapache-kafkafaust

Join Kafka streams in Python


I need to work with Kafka streams in Python and I am analyzing the different libraries available. This question provided some good answers and it looks like the Faust fork is the most complete Kafka library in Python. However, I need to join different Kafka streams and I am not sure how to accomplish this, or if it is even supported.

I searched Faust's documentation and saw that there are some definitions in place for joins, but if I go to the source code, they are not implemented. So it looks like they are not supported, but maybe I am missing something or there is a different library that does support it.

I also found this relevant question, but it is from 2017 so a lot could have changed in the Python world.


Solution

  • How did you get on with joins using Faust?

    FYI there are open source Python alternatives to Faust that are focused on Python developer experience and shipping features regularly. I work on Quix Streams and like Faust it's a pure Python alternative, doesn't require a server-side cluster and has good adoption. Fun fact: Fluvii is mentioned here in the comments and its previous lead engineer now works on Quix Streams (he wrote a blog post on stateful streaming concepts).

    There is support for windowing, stateful functions and exactly once semantics and whilst joins are in our roadmap for this year, it's possible to implement them today manually with a reducing step in a hopping window (example code here).