[SOLVED] What's the purpose of Kafka's key/value pair-based messaging?

What's the purpose of Kafka's key/value pair-based messaging?

All of the examples of Kafka | producers show the ProducerRecord's key/value pair as not only being the same type (all examples show <String,String>), but the same value. For example:

producer.send(new ProducerRecord<String, String>("someTopic", Integer.toString(i), Integer.toString(i)));

But in the Kafka docs, I can't seem to find where the key/value concept (and its underlying purpose/utility) is explained. In traditional messaging (ActiveMQ, RabbitMQ, etc.) I've always fired a message at a particular topic/queue/exchange. But Kafka is the first broker that seems to require key/value pairs instead of just a regulare 'ole string message.

So I ask: What is the purpose/usefulness of requiring producers to send KV pairs?

Solution

Kafka uses the abstraction of a distributed log that consists of partitions. Splitting a log into partitions allows to scale-out the system.

Keys are used to determine the partition within a log to which a message get's appended to. While the value is the actual payload of the message. The examples are actually not very "good" with this regard; usually you would have a complex type as value (like a tuple-type or a JSON or similar) and you would extract one field as key.

See: http://kafka.apache.org/intro#intro_topics and http://kafka.apache.org/intro#intro_producers

In general the key and/or value can be null, too. If the key is null a random partition will the selected. If the value is null it can have special "delete" semantics in case you enable log-compaction instead of log-retention policy for a topic (http://kafka.apache.org/documentation#compaction).