apache-kafkamicroservicespublish-subscribeevent-driven-design

How will Kafka manage order of events if 1 Topic and 1 consumer in 1 consumer group but 1 consumer is running on multiple machine


Recently I was given a situation:

  1. Events are being published to 1 Topic
  2. Given we have 1 consumer in 1 consumer group.
  3. In order to keep the pace of produced message and consumeed message we have running 10 instances of coneumer on 10 different machine.

Rephrasing with Given data, Lets say we have 1 consumer which is producing Events at the rate on 10 Thousand/Second to a Topic which has 1 Partition. And we have 1 consumer Group and Which 1 consumer BUT we have 10 instances of the same consumer on 10 Machine in order to meet the consumption(As one consumer can consume only 1 Thousand/Second) and to increase the performance at consumer side.

I was asked that, we can't increase the consumer in consumer group[ till here it sounds sensible as since we have only one partition so no point of increasing consumer in the group ] so we are running 1 consumer on muliple instances.

Partition: P0, Consumer Group: G1, Consumer in Consumer Group : C1 G1, Instance Machine I1, Consumer on instance: <C1 G 1 I1>

Producer --> P0 --> G1[ { C1 G1 I1} , {C1 G 1 I2}...,....{C1 G1 I10}]

Question: 1. How we will insure that each instance is not getting the same records?

Question: 2. How we will make sure of the order?


Solution

  • As of kafka topic architecture, message ordering is guaranteed on the partition level, not on the entire topic.

    So if you have a multi-partitions topic and a multi-threaded consumer group, then the order will be only guaranteed on a consumer thread basis, not the entire group.

    As each thread is taking 1 or more partitions (depends on how many partitions vs consumer threads), so each thread only aware of the messages within its partitions, nothing more.

    I recommend you to go through the below resources for in-depth details about the consumer groups and ordering guarantee