javaapache-kafkakafka-consumer-api

What is the impact if delay kafka manual commit offset?


We want to manual commit kafka offset to control data lose events, but we might delay the manual commit as we want to do this only after persisting to datasource.

I would like to learn how slowing down a commit offset impacts kafka's topic/parallelism/partition if at all.


Solution

  • When you consume from one topic, if that consumers belongs to one consumer group, Kafka will make sure one partition consumed by one consumer. So if you commit manually it will not affect to other consumers because they consuming from another partition.

    But if you compare same partition consumer with enable.auto.commit=false and enable.auto.commit=true, that auto commit enabled consumers throughput if relatively high. And if you don't need the confirmation of your commits, then use commitAsync, it will improve throughput than commitSync.

    Generally, you call the API when you are finished processing all the messages in a batch, and don’t poll for new messages until the last offset in the batch is committed. This approach can can affect throughput and latency, as can the number of messages returned when polling, so you can set up your application to commit less frequently.

    But, if you do manual committing, There can be duplicate consumed messages when consumer restarts or rebalances. When you consume a message and write to your db, after that you are going to commit the message to Kafka. If consumer rebalance or restart at that time, that message will not be committed and will be re-consumed by another consumer in same group.

    For more informations, please refer