Basically the title.
We've been running kafka with zookeeper for years. Setup was to form Zookeeper cluster and then let brokers connect to it. Everything in puppet. No extra steps and after some restarts cluster eventually got up.
Now we're migrating to Kraft (in-place) and I noticed I need to format the storage on every controller using kafka-storage.sh
utility. I thought it's only part of Kraft migration process but apparently this is needed even when new Kraft clusters are built from scratch. As per https://kafka.apache.org/documentation/#quickstart
I guess I need to swallow my 'why they couldn't just automate it as part of boot up process' but I have some follow up questions I can't find answers to:
I've been looking for this information as well.
The best resource I have found that explains these steps in the context of a multi-node cluster is from Confluent:
Quoting relevant text from above doc to answer your questions...:
you must create a unique cluster ID and format the log directories with that ID...Before you start Kafka, you must use the kafka-storage tool with the random-uuid command to generate a cluster ID for each new cluster. You only need one cluster ID, which you will use to format each node in the cluster.
use the cluster ID to format storage for each node in the cluster with the kafka-storage tool
But with regards to your other meta-question: 'why they couldn't just automate it as part of boot up process'...
Previously, Kafka would format blank storage directories automatically and generate a new cluster ID automatically. One reason for the change is that auto-formatting can sometimes obscure an error condition. This is particularly important for the metadata log maintained by the controller and broker servers. If a majority of the controllers were able to start with an empty log directory, a leader might be able to be elected with missing committed data.