Question (TL;DR;)
What I am looking for is a way to tell kudu to replicate data away from a directory (/data/0
in the context below), or to decommission a directory. Is it possible?
Context
I have a kudu setup with multiple data directories (all on different disks), eg. /data/0
, /data/1
, /data/2
.
Currently the WALs are on /data/0
, as well as kudu tablets, hdfs directory and yarn local dir. Long story short, this disk is overloaded and I want to migrate away everything except the WALs.
This question relates to the kudu tablet directory. I know how to force remove a disk from the doc but:
If --force is specified, all tablets configured to use that directory will fail upon starting up and be replicated elsewhere.
That sounds OK-ish (tablets will eventually be replicated) but I happen to have a few tables with a replication factor of 1, so those ones would be completely destroyed.
Workarounds
I am aware of a few workarounds, but none of them is ideal:
kudu tablet change_config move_replica
tablets for the tables with RF 1 from eg. server 1 to server 2, then remove the directory for server 1, rebalance, then rinse and repeat from server 2 to 3 and then 3 to 1 (I have only 3 servers)./data/0
inside /data/1
(the configuration actually does not use the whole disk, but a subdirectory there) but /data/1
would then receive twice as many IOs.Sanity checks
First, you need to make sure that there are no tables with replication factor 1. If by bad luck some tablet of this table are on the disk you will remove, then the table would become unavailable. Note that the user running this command must be in the superuser_acl list of Kudu (replace of course ${kudu_master_host} with the real hostname).
kudu cluster ksck ${kudu_master_host} | grep '| 1 |' | cut -f2 ' '
If there are tables there, you need to
Start a rebalance. After this the data will be properly spread, and more importantly we know that rebalance can happen.
kudu cluster rebalance ${kudu_master_host}
Stop kudu.
Remove a disk
Note: do this node per node! It should be possible to do 2 at a time, but I haven’t tested it. If you use Cloudera manager, you need to use config groups.
Remove the path to directory you want to remove from fs_data_dirs
.
While kudu is still stopped, tell kudu on the tablet server which configuration you just changed, that there is now 1 less disk:
sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=<your wal directory> --fs_data_dirs=<comma separated list of remaining directories>
Restart kudu. Data will be automatically rebalanced.
Congrats, go to your next node once all tablets are happy (kudu cluster ksck ${kudu_master_host}
does not return any error).