apache-kafka

How can I scale Kafka consumers?


I'm reading the Kafka documentation and noticed the following line:

Note however that there cannot be more consumer instances in a consumer group than partitions.

Hmm. How can I auto-scale this?

For example let's say I have a messaging system with hi/lo priorities, so I create a topic for messages and partitions for hi and lo priority messages.

If this was RabbitMQ, I'd have an auto-scalable group of consumers assigned to each partition, like this:

enter image description here

If I understand the Kafka model I can't have >1 consumer per partition in a consumer group, so that picture doesn't work for Kafka, right?

Ok, so what about >1 consumer groups like this:

enter image description here

That get's around Kafka's limitation but... If I understand how this works both consumer groups would be pulling from a partition, for example msg.hi, with their own offsets so neither would know about the other--meaning messages would likely be delivered twice!

How can I achieve the capability I had in the Rabbit design w/Kafka and still maintain the "queue-ness" of the behavior (I don't want to send a message twice)? What am I missing?


Solution

  • Just create a bunch of partitions for hi and lo. 12 is a good number. So is 60. Just pick a number of partitions that matches how much maximum parallelization you want.

    Honestly, although I personally would makemsg.hi and msg.lo be different topics entirely, that's not a requirement -- you can do custom parititoning to divide messages between partitions.