I'm reading the Kafka documentation and noticed the following line:
Note however that there cannot be more consumer instances in a consumer group than partitions.
Hmm. How can I auto-scale this?
For example let's say I have a messaging system with hi/lo priorities, so I create a topic for messages and partitions for hi and lo priority messages.
If this was RabbitMQ, I'd have an auto-scalable group of consumers assigned to each partition, like this:
If I understand the Kafka model I can't have >1 consumer per partition in a consumer group, so that picture doesn't work for Kafka, right?
Ok, so what about >1 consumer groups like this:
That get's around Kafka's limitation but... If I understand how this works both consumer groups would be pulling from a partition, for example msg.hi, with their own offsets so neither would know about the other--meaning messages would likely be delivered twice!
How can I achieve the capability I had in the Rabbit design w/Kafka and still maintain the "queue-ness" of the behavior (I don't want to send a message twice)? What am I missing?
Just create a bunch of partitions for hi and lo. 12 is a good number. So is 60. Just pick a number of partitions that matches how much maximum parallelization you want.
Honestly, although I personally would makemsg.hi
and msg.lo
be different topics entirely, that's not a requirement -- you can do custom parititoning to divide messages between partitions.