apache-kafkaupgrade

During rolling upgrade/restart, how to detect when a kafka broker is "done"?


I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).

What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.

Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...


Solution

  • I would suggest exposing JMX metrics and tracking the following for cluster health

    Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true