shardingdatabase-partitioningsystem-designconsistent-hashing

How failures and restore operations in sharding (consistent hashing)


In consistent hashing, suppose we are using username as for hashing hashFunction(username) = nodeA

Now from what I understand, if there is any failure or a node is removed requests will be directed to the next node in the ring. Suppose nodeA failed and it's write requests were directed to nodeB. So the data is now written on shardB. How will this be handled once shardA is up and running again. Our hash function will return hashFunction(username) = nodeA but data will not be available on nodeA/shardA since it was written on shardB


Solution

  • Consistent hashing minimized number of keys needed a remap when one changes (adds or removes) nodes in a cluster. By itself, consistent hashing does not deal with node being dow, that's where the concept of "next node" comes in.

    Let me provide an example!

    Let's say our hashing function is key mod 4 - so for every integer key, the mod operation may return one of[0,1,2,3].

    Let's assume our cluster has just two nodes: A an B - and the mapping is: A ->[0,1] and B->[2,3]. Now we add another node C and we can remap a key to that node, for instance C->[3]; so the whole cluster is: A->[0,1], B->[2] and C->[3]. With consistent hashing only affected keys [3] need to be moved, all others stay in existing nodes.

    Now with a cluster of three nodes, let's assume node B failed - B is still responsible for keys [2]. So if we want to write (or read) a value - we can't - the node is offline. In this case we want to use "next" node as a place holder.

    To summarize: