I am ingesting data into Druid from Kafka's topic. Now I want to migrate my Kafka Topic to the new Kafka Cluster. What are the possible ways to do this without duplication of data and without downtime?
I have considered below possible ways to migrate Topic to the new Kafka Cluster.
Note: Druid manages Kafka topic offset in its metadata.
Druid Version: 0.22.1
Old Kafka Cluster Version: 2.0
You can follow these steps:
1- On your new cluster, create your new topic (the same name or new name, doesn't matter)
2- Change your app config to send messages to new kafka cluster
3- Wait till druid consume all messages from the old kafka, you can ensure when data is being consumed by checking supervisor's lagging and offset info
4- Suspend the task, and wait for the tasks to publish their segment and exit successfully
5- Edit druid's datasource, make sure useEarliestOffset is set to true, change the info to consume from new kafka cluster (and new topic name if it isn't the same)
6- Save the schema and resume the task. Druid will hit the wall when checking the offset, because it cannot find them in new kafka, and then it starts from the beginning