How can we make sure Kafka exactly-once semantics in read-process scenario. read means we are reading from Kafka topic and doing some processing and then we are trying to commit the offset. Lets suppose, we processed the messages but could not able to commit and before commit the process crashed. after restart, again trying to consume the same message. so how to handle such scenarios? Can this be handled with Kafka Transaction APIs?
There is similar question but not able to understand it properly and left few comments there as well. Just wanted to confirm my understanding. Confused about Kafka exactly-once semantics
Kafka Transaction offers EOS for consume-process-produce scenarios. This exactly once process works by committing the offsets by producers instead of consumer. i.e., the produce of result to kafka and committing the consumed messages all are done by kafka producer (instead of separate kafka consumer and producer) which brings the exactly once. The EOS in kafka transaction ensures that for each consumed message we have exactly one result (the result may contain multiple messages) on the kafka, but the message could be processed multiple times in failure scenarios.
So you cannot achieve exactly once in read-process. The only solution you can use is to make your messages idempotence and change your business logic somehow that duplicate messages do not have side effect. e.g.:
-Using deduplicate process if you use database and check the duplicate value before insert or process and drop the incoming message.
-In some scenarios that duplicates affect you database, we can commit the offsets to database and by that make the data insertions and offset commits in one transaction.