javaspring-bootapache-kafkaconfluent-platformkafka-cluster

Can kafka partitions be spread across multiple kafka cluster nodes?


My application has a list of kafka cluster nodes specified in the spring.kafka.bootstrap-servers property and listens to topics on all these nodes.

If I were to create a topic on one of these nodes, with lets say 5 partitions, will these partitions be spread across these multiple nodes or will they be created on a single node? Also, how can I find out which node a topic partition actually exists on?


Solution

  • You don't actually create topics in one specific node in a Kakfa cluster. When you issue a request to create a topic, the partitions will automatically be spread out across all nodes belonging to the cluster, and the replicas will also be spread out. That is how Kafka handles high-availability. If one of the nodes is down, some other node has all the required data, so there is no downtime or impact to users of the cluster.

    You can issue a --describe command like this:

    > bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
    
        Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
            Topic: my-replicated-topic  Partition: 0    Leader: 1   Replicas: 1,2,0 Isr: 1,2,0
    

    That will give you a list of the partitions for your topic, where are they located, which node is the leader for that partition (the one consumers are told to consume from when they need data from that partition), and some more info like the In-Sync Replica status, or ISR, and the replication factor.

    There's more info at the official Kafka docs here and here.

    Bear in mind that when your client connects to the bootstrap-server it is not specifying a complete list of brokers from which to read data. It's just specifying one (or more) brokers from which to pull information about the cluster. When the client reads/writes from a given topic and partition that is done directly to the relevant broker that holds that data (regardless of the particular brokers specified in the bootstrap). You can see more about this process here and here.