In my application, I have producers publishing to a kafka topic (with only 1 partition) & multiple consumers (each consumer is in its own consumer group) consuming from the topic. Now, for some reason I need to use the transactional kafka producer (using the spring-kafka library).
My question is regarding the transactional.id prefix. The docs here specify how to pick up a transactional.id but I think its more relevant for use cases where you have a read-process-consume cycle (and that too for multiple partitions/topic)
For my simple use case, is it sufficient that the transactional.id can be a random string. Does it have to be the same across process restarts/any other scenarios ?
Tried reading many docs on this topic but couldn't get clarity. Thanks
Yes, the transactional.id
must be unique for each producer instance to avoid fencing. It does not have to be the same on each restart. However, there might be a performance hit after the restart (consumer side) if there is partial transaction in the log after a producer died; the consumer would have to wait for it to time out before proceeding to the next available record. This can be mitigated by reducing the transaction timeout (default 1 minute). https://kafka.apache.org/documentation/#producerconfigs_transaction.timeout.ms
In fact, that article is a bit out of date; since EOS mode V2 (BETA), it can be unique, even for exactly once consume->process->produce
sequences.
Previously, a different producer was required for each group/topic/partition
for these scenarios.