cassandracassandra-3.0clustering-key

Performance of query with only partition key


Is the performance impacted if I provide only the partition key while querying a table containing both partition key and clustering key?

For example, for a table with partition key p1 and clustering key c1, would

SELECT * FROM table1 where p1 = 'abc';

be less efficient than

SELECT * FROM table1 where p1 = 'abc' and c1 >= 'some range start value' and c1 <= 'some range end value';

My goal is to fetch all rows with p1 = 'abc'.


Solution

  • Main cost in going to particular row vs a particular partition is that theres an extra work and necessity of deserializing the clustering key index at the beginning of the partition. Its a bit old and based on thrift but the gist of it remains true in the following:

    http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html (note: row level bloom filter was removed)

    When reading from a beginning of a partition you can save a little work which will improve the latency.

    I wouldn't worry too much about it as long as your queries are not spanning multiple partitions. Then you will generally only have issues if the partitions get to be hundreds of mb or gb's in size.