distributed-computingclockdistributed-systemtrusted-timestampvector-clock

How to determine Last write win on concurrent Vector clocks?


I'd like to keep track of only the recent data and also employ the help of Vector clocks in resolving issues so I can easily discard data via L-W-W rule.(last write wins) Say we have 3 nodes:

- Node1
- Node2
- Node3

Then we would use Vector clocks to keep track of causality and concurrency on each events/changes. We represent Vector clocks initially with

{Node1:0, Node2:0, Node3:0}.

For instance Node1 gets 5 local changes it would mean we increment its clock by 5 increments that would result into

{Node1: 5, Node2:0, Node3:0}.

This would be normally okay right?

Then what if at the same time Node2 updates its local and also incremented its clock resulting into

{Node1:0, Node2:1, Node3:0}.

At some point Node1 sends an event to Node3 passing the updates and piggybacking its vectorclock. So Node3 which has a VC of {Node1:0, Node2:0, Node3:0} would easily just merge the data and clock as there are no changes on it yet.

The problem I'm thinking about how to deal with is what would happen if Node2 sends an event to update into Node3 passing it's own VC and updates. What would happen to the data and the clocks. How do I apply Last Write wins here when the first one that gets written to Node3 which was from Node1 would basically appear as the later write as it have a greater VC value on its own clock. Node3's clock before merging: {Node1: 5, Node2: 0 , Node3: 1} Node2's messagevc that Node3 received: {Node1:0, Node2:1, Node3:0}

How do I handle resolving data on concurrent VCs?


Solution

  • This is a good question. You're running into this issue because you are using counters in your vector clocks, and you are not synchronizing the counters across nodes. You have a couple of options:

    1. Submit all writes through one primary server. The primary server can apply a total order to all of the writes and then send them to the individual nodes to be stored. It would be helpful to have some background on your system. For example, why are there three independent nodes? Do they exist to provide replication and availability? If so, this primary server approach would work well.
    2. Keep your server's times in sync, as described in Google's Spanner paper. Then, instead of using a monotonically increasing counter for each node in the vector clock, you can use a timestamp based off of the server's time. Again, having background on your system would be helpful. If your system just consists of human users submitting writes, then you might be able to get away with keeping your server's times loosely in sync using NTP without violating the LWW invariant.