cassandracassandra-2.0cassandra-2.1

Does Cassandra read have inconsistency?


I am new to Cassandra and am trying to understand how it works. Say if a write to a number of nodes. My understanding is that depending on the hash value of the key, its decided which node owns the data and then the replication happens. While reading the data , the hash of the key determines which node has the data and then it responds back. Now my question is that if reading and writing happen from the same set of nodes which always has the data then how does read inconsistency occurs and Cassandra returns stale data ?


Solution

  • For Tuning consistency cassandra allows to set the consistency on per query basis.

    Now for your question, Let's assume CONSISTENCY is set to ONE and Replication factor is 3.

    During WRITE request coordinator sends a write request to all replicas that own the row being written. As long as all replica nodes are up and available, they will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgment in order for the write to be considered successful. Success means that the data was written to the commit log and the memtable.

    For example, in a single data center 10 node cluster with a replication factor of 3, an incoming write will go to all 3 nodes that own the requested row. If the write consistency level specified by the client is ONE, the first node to complete the write responds back to the coordinator, which then proxies the success message back to the client. A consistency level of ONE means that it is possible that 2 of the 3 replicas could miss the write if they happened to be down at the time the request was made. If a replica misses a write, Cassandra will make the row consistent later using one of its built-in repair mechanisms: hinted handoff, read repair, or anti-entropy node repair.

    By default, hints are saved for three hours after a replica fails because if the replica is down longer than that, it is likely permanently dead. You can configure this interval of time using the max_hint_window_in_ms property in the cassandra.yaml file. If the node recovers after the save time has elapsed, run a repair to re-replicate the data written during the down time.

    Now when READ request is performed co-ordinator node sends these requests to the replicas that can currently respond the fastest. (Hence it might go to any 1 of 3 replica's).

    Now imagine a situation where data is not yet replicated to third replica and during READ that replica is selected(chances are very negligible), then you get in-consistent data.

    This scenario assumes all nodes are up. If one of the node is down and read-repair is not done once the node is up, then it might add up to issue.

    READ With Different CONSISTENCY LEVEL

    READ Request in Cassandra