I have a distributed hash table (DHT) which is running on multiple instances of the same program, either on multiple machines or for testing on different ports on the same machine. These instances are started after each other. First, the base node is started, then the other nodes join it.
I am a little bit troubled how I should implement the join of the second node, in a way that it works with all the other nodes as well (all have of course the same program) without defining all border cases.
For a node to join, it sends a join message first, which gets passed to the correct node (here it's just the base node) and then answered with a notify message. With these two messages the predecessor of the base node and the successor of the existing nodes get set. But how does the other property get set? I know, that occasionally the nodes send a stabilise message to their successor, which compares it to its predecessor and returns it with a notify message and the predecessor in case it varies from the sender of the message.
Now, the base node can't send a message, because it doesn't know its successor, the new node can send one, but the predecessor is already valid.
I am guessing, both properties should point to the other node in the end, to be fully joined.
Here another diagram what I think should be the sequence i the third node joins. But again, when do I update the properties based on a stabilise message, when do I send a notify message back? In the diagram it is easy to see, but in code it is hard to decide.
Th trick is here to set the successor to the same value as the predecessor if it is NULL after the join-message has been received. Everything else gets handled nicely by the rest of the protocol.