In the example table below an SAI index is used for the item_category
field. The item_name
column is the clustering key.
CREATE TABLE shopping_cart(
id bigint,
item_name text,
item_category text,
PRIMARY KEY (id, item_name)
) WITH CLUSTERING ORDER BY (item_name ASC);
CREATE INDEX shopping_cart_item_category_idx
ON shopping_cart (item_category)
USING 'sai';
An execution of a query such as
SELECT id, item_name, item_category
FROM shopping_cart
WHERE id = 12345 -- partition key
AND item_category = 'softdrinks' -- 2ndary index
is successful. This query selects all softdrink items of the partition. The items in the result set are sorted according to the clustering key. See Query Result
However, the use of an ORDER BY
clause in the query such as
SELECT id, item_name, item_category
FROM shopping_cart
WHERE id = 12345 -- partition key
AND item_category = 'softdrinks' -- 2ndary index
ORDER BY item_name ASC -- clustering key
fails with the error message ORDER BY with 2ndary indexes is not supported, except for ANN queries
. Since this query reads only a single partition, the query with ORDER BY
should be allowed in my opinion.
The fact that a select with ORDER BY
fails on multiple partitions, as shown below, is understandable to me as Cassandra does not support global SAI sorting, currently.
SELECT id, item_name, item_category
FROM shopping_cart
WHERE item_category = 'softdrinks' -- 2ndary index
ORDER BY item_name ASC -- clustering key
Is the observed behavior merely an issue of the query parser?
You are correct. The error is caught during validation of the query in o.a.c.cql3.statements.SelectStatement.java
:
private static void verifyOrderingIsAllowed(StatementRestrictions restrictions, Map<ColumnMetadata, Ordering> orderingColumns) throws InvalidRequestException
{
if (orderingColumns.values().stream().anyMatch(o -> o.expression.hasNonClusteredOrdering()))
return;
checkFalse(restrictions.usesSecondaryIndexing(), "ORDER BY with 2ndary indexes is not supported, except for ANN queries.");
checkFalse(restrictions.isKeyRange(), "ORDER BY is only supported when the partition key is restricted by an EQ or an IN.");
}
In general, ORDER BY
for reads which are filtered by an indexed column are not allowed because it's impossible to know in advance whether the query will return 10 rows or thousands of rows. If the results are paged, the returned rows cannot be sorted since it needs to read all the rows.
However, you're sorting by the clustering key anyway so it shouldn't make a difference since the results are going to be returned in the order that they are stored.
Interestingly, I found a ticket requesting ORDER BY
support for secondary indexes (CASSANDRA-14808) from 2018 but it doesn't seem to have had any traction probably due to lack of demand. Cheers!