apache-kafkamicroservicesevent-drivenidempotent

Why do we need Outbox pattern with Kafka


I do not understand why we need the Outbox pattern when we have Event Driven architecture with Kafka.

Kafka offers Message Delivery Guarantees as described here, having idempoency on producers and consumers with "Exactly-Once". Isn't it sufficient?

In case of answers, please include an example.


Solution

  • From Exactly-Once Semantics Are Possible: Here’s How Kafka Does It blog post:

    Note that exactly-once semantics is guaranteed within the scope of Kafka Streams’ internal processing only; for example, if the event streaming app written in Streams makes an RPC call to update some remote stores, or if it uses a customized client to directly read or write to a Kafka topic, the resulting side effects would not be guaranteed exactly once.

    Stream processing systems that only rely on external data systems to materialize state support weaker guarantees for exactly-once stream processing. Even when they use Kafka as a source for stream processing and need to recover from a failure, they can only rewind their Kafka offset to reconsume and reprocess messages, but cannot rollback the associated state in an external system, leading to incorrect results when the state update is not idempotent.

    So it depends on what you are actually doing in your app and Kafka. If your processing is limited to the Kafka Streams internal processing only - then you do not need to use the transactional outbox.

    I'm not a great Kafka expert but for example updating a database can be considered a side effect and unless you can somehow to use a distributed transaction between Kafka and the database then you might need to use the transactional outbox (or change data capture - CDC via Transaction log tailing).

    The scenario is quite common - you need to atomically update the database and send messages to a message broker:

    enter image description here