apache-kafkaoffsetretention

How to identify a specific message in Kafka


As I understand that a Kafka message can be identified by topic, partition and offset. If I add the message along with the topic, partition and offset into my local database, then I can compare this when a new Kafka message received to ensure I won't insert the same message again.

But by default Kafka topic has a retention policy to keep the Kafka messages only for 7 days. After that the messages will be removed.

My question is that after a Kafka message is removed by the retention policy, will the message offset be re-used for new message? If yes then it will be an issue for me to mistreat a new message as an existing message as they held the same offset. Please advise how the offset works for the retention policy and how to handle this. Thank you!


Solution

  • No, as long as the Kafka cluster is not recreated, a topic will not reuse offsets. It is common to keep the offset stored (e.g. in the database or automatically using consumer groups) to know up to which point a consumer has processed a topic.