cassandratimeoutconsistencycassandra-2.2

Consistency and timeout issues with Cassandra 2.2


I'm using Cassandra 2.2 and I've an application that requires a high level of consistency.

I've configured one datacenter cluster with 3 nodes. My keyspace is created with replication_factor of 2. In each configuration.yaml files I've set 2 seed_providers (for example NODE_1 and NODE_3).

The important thing is that my app should be full-functional even if one node is down.

Currently I've some issues with the consistency and timeout when my app contacts the cluster.

I've read the whole Cassandra 2.2 documentation and I concluded that the best CONSISTENCY LEVEL for my write operations should be QUORUM and for my read operations ONE, but I still have some consistency issues.

First of all, is it the right choice to have a strong level of consistency? And also, are UPDATE and DELETE operations considered as write or read operations, since for example an update operation with a WHERE clause still has to 'read' data? I'm not sure, spacially in the context of the cassandra' write workflow.

My second issue is the timeout during the write operations. A simple and lightweight INSERT sometimes get "Cassandra timeout during write query at consistency QUORUM (2 replicas were required but only 1 acknowledged the write)" or sometines even "... 0 acknoledged" even though all of my 3 nodes are UP.

Are there some other parameters that I should check, like for example write_request_timeout_in_ms, with default value of 2000 ms (which is already a high value)?


Solution

  • You will have strong consistency with Replication Factor = 2 and Consistency Level = QUORUM for write operations and ONE for read operations. But write operations will fail if one node is down. Consistency Level = QUORUM is the same as ALL in case Replication Factor = 2.

    You should use Replication Factor = 3 and Consistency Level = QUORUM for both write and read operations, to have strong consistency and full functional app even if one node is down.

    DELETE and UPDATE operations are write operations.

    For the timeout issue please provide table model and queries that fails.

    Updated

    Consistency level applies to replicas, not nodes.

    Replication factor = 2 means that 2 of 3 nodes will contain data. These nodes will be replicas.

    QUORUM means that a write operation must be acknowledged by 2 replicas (when replication factor=2), not nodes.

    Cassandra places the data on each node according to the partition key. Each node is responsible for a range of partition keys. Not any node can store any data, so you need have alive replicas (not nodes) to perform operations. Here article about data replication and distribution.

    When you perform QUORUM write request to cluster with 2 of 3 alive nodes, there is a chance that the cluster has only 1 alive replica for the partition key, in this case the write request will fail.

    In additional: here is a simple calculator for Cassandra parameters