apache-kafkakafka-consumer-apikafka-topickafka-partition

Does Kafka consumer reads the message from active segment in the partition?


Let us say I have a partition (partition-0) with 4 segments that are committed and are eligible for compaction. So all these segments will not have any duplicate data since the compaction is done on all the 4 segments.

Now, there is an active segment which is still not closed. Meanwhile, if the consumer starts reading the data from the partition-0, does it also read the messages from active segment?

Note: My goal is to not provide duplicate data to the consumer for a particular key.


Solution

  • Your concerns are valid as the Consumer will also read the messages from the active segment. Log compaction does not guarantee that you have exactly one value for a particular key, but rather at least one.

    Here is how Log Compaction is introduced in the documentation:

    Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.

    However, you can try to get the compaction running more frequently to have your active and non-compated segment as small as possible. This, however, comes at a cost as running the compaction log cleaner takes up ressources.

    There are a lot of configurations at topic level that are related to the log compaction. Here are the most important and all details can be looked-up here:

    However, I am quite convinced that you will not be able to guarantee that your consumer is never getting any duplicates with a log compacted topic.