So, I am new to Kafka and I have been reading about it for a while. I found this information on confluent.
https://docs.confluent.io/current/streams/architecture.html
So what I understood from this is, say I have a topic called plain_text where I just send a bunch of records as plain text and I just have one broker with a single topic and single partition. I now start 2 consumer instances ConsumerA and ConsumerB. Since my partition count is less than the consumer count only one of the consumers should actively consume the messages leaving the other in an idle state. Please correct me if I am wrong.
I ran a test using the kafka-console-* scripts
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--replication-factor 1 \
--partitions 1 \
--topic plain_text
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic plain_text
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic plain_text \
--formatter kafka.tools.DefaultMessageFormatter \
--property print.key=true \
--property print.value=true \
--property group.id=test_group \
So one of the two consumers should own that single partition (again please correct me if I am wrong), but whatever I produce on the producer console is visible on both consumer consoles. Why is that both consumers are consuming messages from a single partition. Is there something I am missing or do different rules apply to the kafka-console-* scripts.
If not specified, each kafka-console-consumer run will create a different consumer group id, you can check this using:
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list | grep console-consume
console-consumer-68642
console-consumer-30430
You can add --group your_group_name
or --consumer-property group.id=your_group_name
to specifically register group.id
for your console consumers