shardingtarantoolrebalancing

How to add another shard to production for tarantool Database, without downtime?


We use tarantool database (sharded using vshard) in production. We started directly with 4 shards. Now we want to increase it to 6 without downtime. But, after adding two more shards, rebalancer kicks in and it doesn't allow reads/writes to happen. Is there any way, that rebalancing can happen supporting all kinds of operations? We can afford to increase the operation time. But it should be a success. What is the best practice to add a shard to tarantool with the minimum inconvenience caused in the product front?

Currently, the Only solution we can think of is to go into maintenance mode and have the rebalance to finish with minimum time possible!!!


Solution

  • You can not write to a bucket that is being transferred right now, but you cant write to other buckets (so it's not like the whole shard is locked up).

    Moreover, you can mitigate the effect by - making buckets smaller (increase bucket_count) - making rebalancing slower so that that less buckets are transferred simultaneoulsy (rebalancer config).

    Suppose, you have 16384 buckets and your dataset is 75GB. It means that average bucket size is around 5 Mb. If you decrease rebalancer_max_receiving parameter to 10, you'll have only 10 buckets (50Mb) being transferred simultaneously (which makes him locked for writes).

    This way, rebalancing will be pretty slow, BUT, given that your clients can perform retries and your network between shards is fast enough, the 'write-lock' effect should got unnoticed at all.