databasecassandranosql

How cassandra uses all shards for reading?


I need to understand how cassandra uses all shards for reading. I know that it uses consistent hashing to determine which node is responsible for data. But other nodes might contain the replica of the shard and I want to understand how the requests are shared. Does it use a second consistent hashing to select replica to send request?

I tried reading from multiple sources but got no clear understanding. but one thing is clear it uses all the shard replicas for reading.


Solution

  • Assuming a replication factor of 3, then 3 nodes will have the data. The consistent hash of the value in the row of the partition key is used to position the first of the three replicas on the token ring.

    The next 2 replicas are placed based on a set of rules - which are primarily that in a multi-rack scenario, the next 2 replicas are placed clockwise on the next 2 nodes which are on different racks to the first replica (and each other). In a single rack topology, the next 2 nodes in the ring are selected.

    The decision on which node to send the read requests to is based on the driver settings - more specifically the load balancing policy..

    The default policy is a token aware policy, meaning that it will send read requests to one of the three replicas to act as the co-ordinator. The default policy will spread the read queries across the 3 replicas.

    It should be remembered that this is the co-ordinator node for this specific query, and based on the consistency level requested, that co-ordinator node may need to contact other replica holders before returning the response to the application.