If we have one topic with 4 partitions in Kafka. There are 4 publisher which publish message in the same topic.
All publisher publish different count of message like publisher1 publishes W messages, publisher2 publishes X messages, Publisher3 publishes Y messages and Publisher4 publishes Z messages.
How many messages are in the Each Partition?
Unless your producers do not specifically write to certain partitions (by providing the partition number while constructing the ProducerRecord), the message produced by each producer will - by default - land in one of the partitions based on its key. Internally the following logic is being used:
kafka.common.utils.Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
where keyBytes
is the byte presentation of your key and numPartitions
is 4 in your case. In case you are not using any key, it will be distributed in a round-robin fashion.
Therefore, it is not possible to predict how many messages are in each partitions without knowing the keys being used (if keys are used at all).
More on the partitioning of message is given here.