cassandra

EACH_QUORUM VS QUORUM


This is a screenshot from the consistency level table according to Datastax documentation:

enter image description here

What is the difference between EACH_QUORUM and QUORUM? Each and all DC's are the same AFAIK. In the QUORUM row the following is stated:

Some level of failure is possible

Why? If one node is down in each DC? The same applies for EACH_QUORUM right? Why does EACH_QUORUM does not have some level of failure, since it is ALL_QUORUM and not ALL?

Both levels have the same in common (AFAIK):


Solution

  • The difference between QUORUM and EACH_QUORUM is as follows.

    Assume you have 6 nodes in your cluster - 2 DCs with 3 nodes each and RF=3 for both DCs (all nodes have all data).

    The QUORUM and EACH_QUORUM value is the same = 4 (6/2 + 1). However, which nodes can respond varies slightly. EACH_QUORUM has less combinations of what will satisfy the condition.

    QUORUM requires 4 nodes to respond but with any combination of nodes. So for example, maybe 3 nodes from the local DC and 1 node from the remote DC respond. That's perfectly fine.

    Now, with EACH_QUORUM, each DC must have a quorum respond. What the means is that 2 nodes from each DC must respond in this case, that's it (which 2 nodes in each DC is irrelevant) . 3 nodes from the local DC and 1 node from a remote DC does not qualify as 1 node in the remote dc is not a quorum of that dc.

    Let's change the cluster node count to 7 instead of 6. DC1 has 4 nodes, DC2 has 3 nodes. DC1 RF = 4 and DC2 RF = 3 (all nodes have the data again). Here's where the fun begins with the odd number total in the RF.

    While I'm not sure about the word "failure", but I can see certain scenarios where this could be problematic.

    For QUORUM, 4 nodes need to respond (7/2 + 1 = 4) - any 4 nodes - including the scenario when all nodes from the local/larger DC responds (DC1 in this case). What if the most current data is on DC2? In this scenario, you could end up with undesirable results.

    With EACH_QUORUM, 5 nodes would need to respond (Quorum of DC1 = 4/2+1 = 3, Quorum of DC2 = 3/2+1 = 2 ==> total = 5). With this scenario, you're forcing Cassandra to return data from both DCs - and a QUORUM level from each DC which should give you good results.

    Again, I'm trying in my head to determine where the additional "failures" could come with QUORUM v.s. EACH_QUORUM and I can't at the top of my head see it. It would seem if anything, EACH_QUORUM with an odd node count, is less flexible in unavailable nodes as a quorum in each DC must respond v.s. any quorum number of nodes from any DC. I can see where QUORUM may give you undesirable results though (explained above).