From this article https://www.confluent.io/blog/transactions-apache-kafka/
Using vanilla Kafka producers and consumers configured for at-least-once delivery semantics, a stream processing application could lose exactly once processing semantics in the following ways:
2. We may reprocess the input message A, resulting in duplicate B messages being written to the output, violating the exactly once processing semantics. Reprocessing may happen if the stream processing application crashes after writing B but before marking A as consumed. Thus when it resumes, it will consume A again and write B again, causing a duplicate.
3. Finally, in distributed environments, applications will crash or—worse!—temporarily lose connectivity to the rest of the system. Typically, new instances are automatically started to replace the ones which were deemed lost. Through this process, we may have multiple instances processing the same input topics and writing to the same output topics, causing duplicate outputs and violating the exactly once processing semantics. We call this the problem of “zombie instances.”
QUESTION
On point #2, it mentions that when application crashes, it will consume A and write B again. But doesn't producer idempotence already handles this case of sending duplicates? Just like point #1?
point #3 also results in duplicate sending, shouldn't #2 and #3 be same issue as #1? which can be handled using producer idempotence?
The idempotent producer only guarantees Exactly once semantics at a per partition level and within the lifetime of the producer.
So it is able to cover scenario 1).
But if the producer crashes (or even is cleanly restarted), these guarantees don't hold any longer and like you described in 2) and 3) it may cause duplicates.
To tackle 2) and 3), you can use the transactional producer. It can ensure messages are processed and committed atomically hence if there's any failures, intermedaite work will be properly discarded so a new instance starting won't cause any duplication. Also "zombie" instances will be properly fenced and prevented to violate exactly once semantics.
The tradeoff is guarantee/speed. The transactional producer offers more guarantees but can have an impact on performance.
In any case, which producer you pick depends on your requirements. See these 2 sections for the docs, provide a bit more details: