cassandra

ORDER BY on SAI field not supported for single-partition reads


In the example table below an SAI index is used for the item_category field. The item_name column is the clustering key.

CREATE TABLE shopping_cart(
    id bigint,
    item_name text,
    item_category text,
    PRIMARY KEY (id, item_name)
) WITH CLUSTERING ORDER BY (item_name ASC);

CREATE INDEX shopping_cart_item_category_idx
ON shopping_cart (item_category)
USING 'sai';

An execution of a query such as

SELECT id, item_name, item_category 
    FROM shopping_cart 
    WHERE id = 12345                     -- partition key
    AND item_category = 'softdrinks'     -- 2ndary index

is successful. This query selects all softdrink items of the partition. The items in the result set are sorted according to the clustering key. See Query Result

However, the use of an ORDER BY clause in the query such as

SELECT id, item_name, item_category 
    FROM shopping_cart  
    WHERE id = 12345                     -- partition key
    AND item_category = 'softdrinks'     -- 2ndary index
    ORDER BY item_name ASC               -- clustering key

fails with the error message ORDER BY with 2ndary indexes is not supported, except for ANN queries. Since this query reads only a single partition, the query with ORDER BY should be allowed in my opinion.

The fact that a select with ORDER BY fails on multiple partitions, as shown below, is understandable to me as Cassandra does not support global SAI sorting, currently.

SELECT id, item_name, item_category 
    FROM shopping_cart 
    WHERE item_category = 'softdrinks'   -- 2ndary index
    ORDER BY item_name ASC               -- clustering key

Is the observed behavior merely an issue of the query parser?


Solution

  • You are correct. The error is caught during validation of the query in o.a.c.cql3.statements.SelectStatement.java:

      private static void verifyOrderingIsAllowed(StatementRestrictions restrictions, Map<ColumnMetadata, Ordering> orderingColumns) throws InvalidRequestException
      {
          if (orderingColumns.values().stream().anyMatch(o -> o.expression.hasNonClusteredOrdering()))
              return;
          checkFalse(restrictions.usesSecondaryIndexing(), "ORDER BY with 2ndary indexes is not supported, except for ANN queries.");
          checkFalse(restrictions.isKeyRange(), "ORDER BY is only supported when the partition key is restricted by an EQ or an IN.");
      }
    

    In general, ORDER BY for reads which are filtered by an indexed column are not allowed because it's impossible to know in advance whether the query will return 10 rows or thousands of rows. If the results are paged, the returned rows cannot be sorted since it needs to read all the rows.

    However, you're sorting by the clustering key anyway so it shouldn't make a difference since the results are going to be returned in the order that they are stored.

    Interestingly, I found a ticket requesting ORDER BY support for secondary indexes (CASSANDRA-14808) from 2018 but it doesn't seem to have had any traction probably due to lack of demand. Cheers!