apache-kafkaspring-kafkakafka-transactions-api

Transactional.id in a spring transactional kafka producer


In my application, I have producers publishing to a kafka topic (with only 1 partition) & multiple consumers (each consumer is in its own consumer group) consuming from the topic. Now, for some reason I need to use the transactional kafka producer (using the spring-kafka library).

My question is regarding the transactional.id prefix. The docs here specify how to pick up a transactional.id but I think its more relevant for use cases where you have a read-process-consume cycle (and that too for multiple partitions/topic)

For my simple use case, is it sufficient that the transactional.id can be a random string. Does it have to be the same across process restarts/any other scenarios ?

Tried reading many docs on this topic but couldn't get clarity. Thanks


Solution

  • Yes, the transactional.id must be unique for each producer instance to avoid fencing. It does not have to be the same on each restart. However, there might be a performance hit after the restart (consumer side) if there is partial transaction in the log after a producer died; the consumer would have to wait for it to time out before proceeding to the next available record. This can be mitigated by reducing the transaction timeout (default 1 minute). https://kafka.apache.org/documentation/#producerconfigs_transaction.timeout.ms

    In fact, that article is a bit out of date; since EOS mode V2 (BETA), it can be unique, even for exactly once consume->process->produce sequences.

    Previously, a different producer was required for each group/topic/partition for these scenarios.