apache-kafkakafka-consumer-apikafka-transactions-api

How a consumer will read committed messages?


So, as far as I understand from Transactions in Apache Kafka, a read_committed consumer will not return the messages which are part of an ongoing transaction. So, I guess, the consumer will have the option to commit its offset past those ongoing-transaction messages (e.g. to read non-transactional messages) or the option to make no further advancement until the encountered transaction is committed/aborted. I just suppose it will be allowed (by Kafka) to skip those pending-transaction records, but then how the consumer will read them when committed considering that its offset might already be far away?

UPDATE

Consider that the topic could have a mix of records (aka messages) coming from non-transactional producers and transactional ones. E.g., consider this partition from a topic:

non-transact-Xmsg, from-transact-producer1-msg, from-transact-producer2-msg, non-transact-Ymsg

If the consumer encounters from-transact-producer1-msg will he skip the message then read non-transact-Ymsg or will he just hang before the not yet committed from-transact-producer1-msg and by doing so it won't read non-transact-Ymsg?

Consider also that could be many transactional producers and so many equivalents of from-transact-producer1-msg, some committed some not. So, from-transact-producer2-msg could be a committed one at the point when the consumer reached non-transact-Xmsg.


Solution

  • From docs about isolation.level:

    Messages will always be returned in offset order. Hence, in read_committed mode, consumer.poll() will only return messages up to the last stable offset (LSO), which is the one less than the offset of the first open transaction. In particular any messages appearing after messages belonging to ongoing transactions will be withheld until the relevant transaction has been completed. As a result, read_committed consumers will not be able to read up to the high watermark when there are in flight transactions.