mongodbmongoosetheoryvector-clock

Is the MongooseJS "versionKey" (__v field) a "vector clock"?


I've been using MongooseJS's revisionKey for a while now - the __v field that it includes with documents, by default. I understand what the purpose of the revision number is, and generally when it is updated.

I was recently speaking with a friend about the idea of a "vector clock" and I mentioned MongoDB and MongooseJS having this __v field. At the time, it sounded like this could be a vector clock. But having read a little bit about vector clocks, now I'm not sure.

So I'm wondering: Can the versionKey attribute of MongooseJS, and the __v field that it produces by default, be considered a vector clock? Yes, or no, and why?


Solution

  • In my opinion the versionKey you mention cannot be considered a vector clock. You could consider it a Lamport timestamp (or Lamport Clock) though.

    Let's have a global look at what we are managing:

    Both Lamport timestamps and vector clocks are algorithms used to define a causality order on different events happening in a distributed system. In other words, both algorithms are used in order to synchronize events that don't have a common reference.

    Lamport timestamps algorithm uses a single counter for each process (in the case of the question we can say a single counter for each document). The algorithm works as follows:

    1) Each time an event happens within the process (communication, modification, etc.), the counter is preincremented.

    2) When a process sends a message to other process, it attaches the value of the counter to the sent message.

    3) When a process receives any kind of communication the counter is incremented (if the received value is less or equal than the current counter value) or the counter value is set to the received value if it is greater than the current value.

    Here is an example of the algorithm applied to three processes:

    Lamport timestamps for three processes

    Lamport timestamps offer a single counter for all the processes that allows to determine which is the last version of the process (or document in the mongoose case.

    With this said, we could conclude that the versionKey is a mechanism that allows us to know if the version we are dealing with is the current one or we are out of date.

    As Aaron Heckmann points out in his blog post about versioning in Mongoose (Mongoose v3 part 1 :: Versioning:

    In version 3, documents now have an increment() method which manually forces incrementation of the document version. This is also used internally whenever an operation on an array potentially alters array element position.

    So, out of the box you will only be using the versionKey if trying to modify a subdocument that is an array and you are changing the order of that array.

    On the other hand Aaron states that the increment() method manually forces the incrementation of the document version. If you implemented the Lamport's algorithm, you could use this method to increment the version meeting the first rule of the algorithm. In this case you'll be using the versionKey as a Lamport timestamp.

    So (here comes the actual answer to your question). Why does the versionKey cannot be considered a vector clock:

    Here is an extract of the paper:

    Dynamo uses vector clocks in order to capture causality between different versions of the same object. A vector clock is effectively a list of (node, counter) pairs. One vector clock is associated with every version of every object. One can determine whether two versions of an object are on parallel branches or have a causal ordering, by examine their vector clocks. If the counters on the first object’s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten. Otherwise, the two changes are considered to be in conflict and require reconciliation.

    So I wouldn't consider the versionKey a vector clock and I'll consider it a Lamport timestamp with some workarounds.