pythonapache-kafkalibrdkafka

Kafka offset management: enable.auto.commit vs enable.auto.offset.store


A Kafka Consumer by default periodically commits the current offsets unless it is turned off by disabling enable.auto.commit. According to the documentation you're then responsible for committing the offsets yourself. So when I want manual control, that seems to be the way to go, however the documentation also mentions the stored offsets and that if you want manual control you should disable enable.auto.offset.store and use rd_kafka_offsets_store() and leave the auto-commit untouched.

Can somebody explain why the latter approach is to be preferred? Disabling auto-commits should have the exact same effect?


Solution

  • With enable.auto.commit=true librdkafka will commit the last stored offset for each partition at regular intervals, at rebalance, and at consumer shutdown.

    The offsets that are used here, are taken from an in-memory offset store. This store will be updated automatically when enable.auto.offset.store=true.

    If you set enable.auto.offset.store=false you can update this in-memory offset store by yourself via rd_kafka_offsets_store().

    This is preferred over disabling enable.auto.commit because you do not have to reimplement calling commit at regular intervals yourself, but can rely on the already built-in logic instead.

    You have manual control about whether or not offsets are committed either way, but disabling enable.auto.commit and calling commit yourself will most likely lead to more frequent commits.