apache-kafkaapache-kafka-streamstombstone

When do KTable records expire if you don't tombstone them?


I have a topic T with a message expiry retention.ms set for 2 days. The topic has compaction.

If I read that message into a KStream and then further aggregate to a KTable, will the KStream and/or KTable honour that 2 day expiry? When the message is no longer in the topic T, will the message also be removed from the KStream or KTable automatically? Or does some housekeeping process need to tombstone those messages?


Solution

  • delete.retention.ms, the topic's "dirty ratio" (min.cleanable.dirty.ratio), min/max compaction lag, etc are all properties that control how long keys will remain prior to compaction

    Yes, the stream/table should be automatically updated, but you may have remnants of data stored elsewhere in changelog topics or state stores since that is stored outside of the original topic

    Regarding the first property... (From docs)

    gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan).

    Therefore, a stream/table with a timed lag less than delete.retention.ms, then you should expect it to be consuming tombstone records, and if it has been running longer than this time, then it'll have data that might have been deleted