distributeddistributed-systemconsistency

Can Sloppy Quorum guarantee strong read consistency?


In the book "Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems", we can read regarding Sloppy Quorum :

However, this means that even when w + r > n, you cannot be sure to read the latest value for a key, because the latest value may have been temporarily written to some nodes outside of n

But i can't seem to understand the meaning behind this.

Let's take an example of clients writing different values for a key "K1".

Suppose we have :

We have W+R>N.

Suppose we already have a first value for "K1" in A, B, and C : (K1,V1)

Then suppose nodes A and B fail, then a client writes (K1,V2) in a Sloppy Quorum configuration.

My understanding is that we will use nodes C, D, and E to write and replicate (K1, V2) in order to avoid blocking the write operation like in a Strict Quorum configuration, and we need the acknowledgment of two of these nodes. Let's say we received acknowledgements from C and D, so the write is successfull.

Now another client wants to read the value of K1, its need two responses, and we have three possibilities :

  1. C and D respond first : we are sure to get (K1, V2) because these are the nodes that aknowledged the write of (K1,V2)
  2. C and E respond first : we might either get two (K1, V2), or a (K1, V2) from C and a (K1, V1) from E if replication to E did not finish yet. In the latter case it is still possible to resolve the conflict using versionning algorithms like Vector Clock. So as long as our system is able to resolve conflicts we are sure to get (K1, V2) (also, such conflict can also happen in a Strict Quorum configuration so it is not really due to the Sloppy Quorum configuration in our case)
  3. D and E respond first : same as the above (i.e. C and E case)

So from this, i would say that our Sloppy Quorum system can guarantee strong read consistency (as long as it handles conflicts, which again is also something that should be handled in Strict Quorum configurations) since the returned value would be (K1,V2), which is not what seems to be explained in the book.

Is there anything wrong in this reasoning ?


Solution

  • After some research and analysis, i think i managed to understand what the author meant.

    The reasoning in the question is correct, but incomplete :

    After the second client gets the value (K1,V2), suppose nodes A and B are restored.

    Normally, since we are in a Sloppy Quorum configuration, there will be a Hinted Handoff phase where nodes D and E send recent data back to A and B when these are restored. We can read in the book :

    Once the network interruption is fixed, any writes that one node temporarily accepted on behalf of another node are sent to the appropriate “home” nodes. This is called hinted handoff.

    The problem with our Sloppy Quorum configuration is that, if another client wants to read the value of K1 right after nodes A and B are restored, but before Hinted Handoff has started, it is possible that this client gets two responses from A and B with the old value (K1,V1), and the client will accept this value since R=2. So the client will not get the latest value in this case, just as explained in the book.

    Clients can safely get the latest value (K1, V2) only after Hinted Handoff has completed as explained by the author :

    Thus, a sloppy quorum actually isn’t a quorum at all in the traditional sense. It’s only an assurance of durability, namely that the data is stored on w nodes somewhere. There is no guarantee that a read of r nodes will see it until the hinted handoff has completed.

    So the answer to the question "Can Sloppy Quorum guarantee strong read consistency?" would be : No.