apache-kafkacap-theorem

Kafka: Can consumers read messages before all replicas are in sync?


I'm designing an event driven distributed system.

One of the events we need to distribute needs 1- Low Latency 2- High availability

Durability of the message and consistency between replicas is not that important for this event type.

Reading the Kafka documentation it seems that consumers need to wait until all sync replicas for a partition have applied the message to their log before consumers can read it from any replica.

Is my understanding correct? If so is there a way around it


Solution

  • If configured improperly; consumers can read data that has not been written to replica yet.

    As per the book,

    Data is only available to consumers after it has been committed to Kafka—meaning it was written to all in-sync.

    If you have configured min.insync.replicas=1 then only Kafka will not wait for replicas to catch-up and serve the data to Consumers.

    Recommended configuration for min.insync.replicas depends on type of application. If you don't care about data then it can be 1, if it's critical piece of information then you should configure it to >1.

    There are 2 things you should think:

    1. Is it alright if Producer don't send message to Kafka? (fire & forget strategy with ack=0)
    2. Is it alright if consumer doesn't read a message? (if min.insync.replica=1 then if a broker goes down then you may lose some data)