Recently I was given a situation:
Rephrasing with Given data, Lets say we have 1 consumer which is producing Events at the rate on 10 Thousand/Second to a Topic which has 1 Partition. And we have 1 consumer Group and Which 1 consumer BUT we have 10 instances of the same consumer on 10 Machine in order to meet the consumption(As one consumer can consume only 1 Thousand/Second) and to increase the performance at consumer side.
I was asked that, we can't increase the consumer in consumer group[ till here it sounds sensible as since we have only one partition so no point of increasing consumer in the group ] so we are running 1 consumer on muliple instances.
Partition: P0, Consumer Group: G1, Consumer in Consumer Group : C1 G1, Instance Machine I1, Consumer on instance: <C1 G 1 I1>
Producer --> P0 --> G1[ { C1 G1 I1} , {C1 G 1 I2}...,....{C1 G1 I10}]
Question: 1. How we will insure that each instance is not getting the same records?
Question: 2. How we will make sure of the order?
As of kafka topic architecture, message ordering is guaranteed on the partition level, not on the entire topic.
So if you have a multi-partitions topic and a multi-threaded consumer group, then the order will be only guaranteed on a consumer thread basis, not the entire group.
As each thread is taking 1 or more partitions (depends on how many partitions vs consumer threads), so each thread only aware of the messages within its partitions, nothing more.
I recommend you to go through the below resources for in-depth details about the consumer groups and ordering guarantee