I have a consumer C1 in consumer group G1 which is reading from topic T1 (doing a poll) and in between this poll, there joins another consumer C2 in same group G1 but subscribes to a different topic T2.
What I have observed is, the partition of Topic T1 is revoked and re-assigned to same consumer C1,which is expected because there is no other consumer for this topic. But my question is why would a revoke happen in first place when the other consumer had subscribed to a different topic?
These are the log prints from consumer C1. At this same moment there is a consumer C2 joining and subscribing to topic T2 :
Revoking previously assigned partitions [T1-1, T1-0]
20/05/28 03:19:04 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-171, groupId=G1] (Re-)joining group
20/05/28 03:19:04 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-171, groupId=G1] Successfully joined group with generation 1117184
20/05/28 03:19:04 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-171, groupId=G1] Setting newly assigned partitions [T1-1, T1-0]
When a new consumer (C2) is added to a consumer group, it is not possible for the existing consumer(C1) to determine if a revoke is required or not. This is because the C1 is not aware of the topic/s that the C2 is subscribed to.
Hence, all the partitions from all the existing consumers are revoked. One of the brokers act as a co-ordinator, it works its logic behind the scenes to come up with a valid assignment and then communicates it to consumers of the consumer group. You can read more about it here: https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2