nosqldistributed-computingriakvector-clockcrdt

G-Counters in Riak: Don't the underlying vclocks provide the same data?


I've been reading into CvRDTs and I'm aware that Riak has already added a few to Riak 2.

My question is: why would Riak implement a gcounter when it sounds like the underlying vclock that is associated with every object records the same information? Wouldn't the result be a gcounter stored with a vclock, each containing the same essential information?

My only guess right now would be that Riak may garbage-collect the vclocks, trimming information that would actually be important for the purpose of a gcounter (i.e. the number of increments).

I cannot read Erlang particularly well, so maybe I've wrongly assumed that Riak stores vclocks with these special-case data types. However, the question still applies to the homegrown solutions that are written on top of standard Riak (and hence inherit vclocks with each object persisted).

EDIT:

I have since written the following article to help explain CvRDTs in a more practical manner. This article also touches on the redundancy I have highlighted above:

Conflict-free Replicated Data Types (CRDT) - A digestible explanation with less math.


Solution

    1. Riak prunes version vectors, no big deal for causality (false concurrency, more siblings, safe) but a disaster for counters.

    2. Riak's CRDT support is general. We "hide" CRDTs inside the regular riak object.

    3. Riak's CRDT support is in it's first wave, we'll be optimising further as we make further releases.

    We have a great mailing list for questions like this, btw. Stack Overflow has it's uses but if you want to talk to the authors of an open source DB why not use their list? Since Riak is open source, you can submit a pull request, we'd love to incorporate your ideas into the code base.