Effiencient Kademlia Buckets

I'm writing a modificted Kademlia P2P system here but the problem I'm describing here is very similar to the implementation of the original one.

So, what's the most efficient way of implementing k-Buckets? What matters for me are access time, parallelism (read & write) and memory consuption.

Thought doing it with a ConcurrentLinkedQueue and a ConcurrentHashMap but that's pretty redundant and nasty, isn't it?

At the moment I'm simply synchronizing a LinkedList.

Here is my code:

import java.util.LinkedList;

class Bucket {
    private final LinkedList<Neighbour> neighbours;
    private final Object lock;

    Bucket() {
        neighbours = new LinkedList<>();
        lock = new Object();
    }

    void sync(Neighbour n) {
        synchronized(lock) {
            int index = neighbours.indexOf(n);
            if(index == -1) {
                neighbours.add(n);
                n.updateLastSeen();
            } else {
                Neighbour old = neighbours.remove(index);
                neighbours.add(old);
                old.updateLastSeen();
            }
        }
    }

    void remove(Neighbour n) {
        synchronized(lock) {
            neighbours.remove(n);
        }
    }

    Neighbour resolve(Node n) throws ResolveException {
        Neighbour nextHop;
        synchronized(lock) {
            int index = neighbours.indexOf(n);
            if(index == -1) {
                nextHop = neighbours.poll();
                neighbours.add(nextHop);
                return nextHop;
            } else {
                return neighbours.get(index);
            }
        }
    }
}

Please don't wonder, I have implemented another neighbour eviction process.

Solution

So, what's the most efficient way of implementing k-Buckets?

That depends. If you want to do an implementation with bells and whistles (e.g. bucket splitting, multi-homing) then you need a flexible list or a tree. In my experience a copy on write array + binary search works well for the routing table because you rarely modify the total number of buckets, just the content of buckets.

With CoW semantics you need less locking since you can just fetch a current copy of the array, retrieve the bucket of interest and then lock on the bucket. Or use an atomic array inside each bucket. But of course such optimizations are only necessary if you expect high throughput, most DHT nodes see very little traffic, a few packets per second at most, i.e. there's no need to involve multiple threads unless you implement a specialized node that has so much throughput that multiple threads are needed to process the data.

CoW works less well for routing-table-like lookup caches or the ephemeral visited-node/target node sets built during a lookup since those get rapidly modified. ConcurrentSkipListMaps can be a better choice if you are expecting high load.

If you want a simplified, approximate implementation then just use a fixed-size array of 160 elements where the array index is the shared-prefix-bits count relative to your node ID. This performs reasonably well but doesn't allow for some of the optimizations proposed by the full kademlia paper.