javaguavaconsistent-hashing

How should I use Guava's Hashing#consistentHash?


I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int) method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash() for simple session affinity to efficiently distribute load across a set of backend servers.

Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.

For example:

@Test
public void testConsistentHash() {
    List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");

    int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
    System.out.println("First time routed to: " + servers.get(bucket));

    // one of the back end servers is removed from the (middle of the) pool
    servers.remove(1);

    bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
    System.out.println("Second time routed to: " + servers.get(bucket));
}

Leads to the output:

First time routed to: server4
Second time routed to: server5

What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.

Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets to 99 and bucket 99 gets distributed across the remaining 99 buckets.


Solution

  • I'm afraid that no data structure can do it really right with the current consistentHash. As the method accepts only the list size, nothing but appending and removal from the end can be supported. Currently, the best solution consist probably of replacing

    servers.remove(n)
    

    by

    server.set(n, servers.get(servers.size() - 1);
    servers.remove(servers.size() - 1);
    

    This way you sort of swap the failed and the very last server. This looks bad as it makes the assignments to the two swapped servers wrong. This problem is only half as bad as one of them have failed. But it makes sense, as after the following removal of the last list element, everything's fine, except for the assignments to the failed server and to the previously last server.

    So twice as much assignments as needed change. Not optimal, but hopefully usable?