kubernetesapache-kafkakafka-transactions-api

How to choose Kafka transactional.id in a Kubernetes (Producer side only transaction) set up


I have a Kafka wrapper library that uses transactions on the produce side only. The library does not cover the consumer. The producer publishes to multiple topics. The goal is to achieve transactionality. So the produce should either succeed which means there should be exactly once copy of the message written in each topic, or fail which means message was not written to any topics. The users of the library are applications that run on Kubernetes pods. Hence, the pods could fail, or restart frequently. Also, the partition is not going to be explicitly set upon sending the message.

My question is, how should I choose the transactional.id for producers? My first idea is to simply choose UUID upon object initiation, as well as setting a transaction.timeout.ms to some reasonable time (a few seconds). That way, if a producer gets terminated due to pod restart, the consumers don't get locked on the transaction forever.

Are there any flaws with this strategy? Is there a smarter way to do this? Also, I cannot ask the library user for some kind of id.


Solution

  • UUID can be used in your library to generate transaction id for your producers. I am not really sure what you mean by: That way, if a producer gets terminated due to pod restart, the consumers don't get locked on the transaction forever.

    Consumer is never really "stuck". Say the producer goes down after writing message to one topic (and hence transaction is not yet committed), then consumer will behave in one of the following ways: