I'm using Kafka's high-level consumer. Because I'm using Kafka as a 'queue of transactions' for my application, I need to make absolutely sure I don't miss or re-read any messages. I have 2 questions regarding this:
How do I commit the offset to zookeeper? I will turn off auto-commit and commit offset after every message successfully consumed. I can't seem to find actual code examples of how to do this using high-level consumer. Can anyone help me with this?
On the other hand, I've heard committing to zookeeper might be slow, so another way may be to locally keep track of the offsets? Is this alternative method advisable? If yes, how would you approach it?
There are two relevant settings from http://kafka.apache.org/documentation.html#consumerconfigs.
auto.commit.enable
and
auto.commit.interval.ms
If you want to set it such that the consumer commits the offset after each message, that will be difficult since the only setting is after a timer interval, not after each message. You will have to do some rate prediction of the incoming messages and accordingly set the time.
In general, it is not recommended to keep this interval too small because it vastly increases the read/write rates in zookeeper and zookeeper gets slowed down because it's strongly consistent across its quorum.