apache-kafkalibrdkafka

When using librdkafka what is the expected behaviour of delivery callbacks when using a transaction?


I'm doing some experiments with Kafka using librdkafka in the context of the kafka exactly once delivery feature.

Librdkafka has a feature called delivery callbacks, from the documentation:

The delivery report callback will be called once for each message accepted by RdKafka::Producer::produce() (et.al) with RdKafka::Message::err() set to indicate the result of the produce request. The callback is called when a message is succesfully produced or if librdkafka encountered a permanent failure, or the retry counter for temporary errors has been exhausted. An application must call RdKafka::poll() at regular intervals to serve queued delivery report callbacks.

Documentation for Kafka transactions states:

The Kafka consumer will only deliver transactional messages to the application if the transaction was actually committed. Put another way, the consumer will not deliver transactional messages which are part of an open transaction, and nor will it deliver messages which are part of an aborted transaction.

I feel there is some ambiguity as to what the expected behaviour of delivery callbacks is if they are used within the context of a transaction. Having tried it out I think that the 2 features are both independent of each other and also compatible with each other.

If a delivery callback is requested by a produce call during a transaction the delivery callback proceeds and may get called before the transaction is completed.

If a transaction aborts, even though there has been a delivery callback a read_committed consumer will not see the related message.

I'd like to check if this is the correct understanding of how these features work together? And also is it then recommended behaviour (if performance criteria allow and poll is called correctly) to wait for the delivery callback before committing the transaction?


Solution

  • Yes, your understanding is correct.

    The delivery reports operates at the producer level, if a message has been acknowledged by the broker it is deemed produced by the producer, and the delivery report is triggered - regardless of transaction state - since the message has in fact been written to the log.

    For a standard non-transactional producer the delivery reports are essential since it is the only way for the application to know whether its produced messages were in-fact produced or not.

    But for the transactional producer you don't really need to use delivery reports since you can simply rely on a successful commit_transaction() to implicitly report all messages as successful. Now, even for the transactional case there might be some use of the delivery reports in that they give more fine-grained error reporting in case of delivery failure, but they're not strictly needed as in the standard producer case.

    With this in mind I typically find it easier to write transactional producer applications than standard ones, since there's in essence only a single point of failure - the commit_transaction() call.