apache-kafkaidempotentexactly-once

Can kafka idempotent producer ensure exactly once with multiple partitions


I'm just a newbiee on Kafka and learnt a little about idempotence of kafka producer.

As my understanding, when a producer sends a message to a broker and the broker need to send back an ACK to the producer to tell it that the message has been received. If the ACK isn't be received by the producer for some reason, the producer has to send the same message again to the broker so the message is duplicated. And the idempotent producer can eliminate this issue.

Basically, each producer would be assigned a PID and each message would be assigned a serial number. So PID + serial number can identify a message. This is how idepotence of kafka works.

If I'm right, let's assume that I create three partitions for one topic, and a producer sends messages to the three partitions with the round robin algorithm, meaning that the three partitions will receive messages one by one. In this case, can Kafka still ensure idempotence?

For example, there are three partitions a, b and c.

At some moment, the producer is sending the message X to the partition a, a receives the X successfully but fails on sending back the ACK. So the producer resend the message X. Now I have two questions:

  1. It would be the partition a or the partition b, which will receive the latest message X?
  2. If it's the partition b, does it mean that the partition a and partition b will have the same message X, meaning that Kafka can't ensure idempotence in this case?

Solution

  • At some moment, the producer is sending the message X to the partition a, a receives the X successfully but fails on sending back the ACK. So the producer resend the message X.

    It would be the partition a or the partition b, which will receive the latest message X?

    Resends are done internally, we don't do it in application code. So, when a message X sent to the partition A did not receive acknowledgement, it will be resent to the same partition. If we manually do resends on application code, then yes, there will be duplicates.

    If the partitioning logic is round-robin it is the next message that will be sent to the next partition. Partitioning logic doesn't apply to resends i.e. if a message send fails, it is resent to the same partition.

    If it's the partition b, does it mean that the partition a and partition b will have the same message X, meaning that Kafka can't ensure idempotence in this case?

    This doesn't apply because the resends are always sent to the same paritition. The partitioning logic will be executed only once before the message is sent, not for every retry.