So I have a table that looks along the lines of:
CREATE TABLE schema.table (
partitionkey int,
clusterkey text,
value bigint,
PRIMARY KEY ((partitionkey), clusterkey)
);
We rely on this table to sort correctly by the clusterkey
for pagination.
Problem is: when returning results back from cassandra, it looks like they are sorted based on their ASCII value rather than a logical A-Z sorting. - which makes sense programatically but not logically for a person viewing it.
So a query of:
SELECT clusterkey FROM schema.table WHERE partitionkey = 1 ORDER BY clusterkey ASC;
get the results on the left, when I expect something like the one on the right
---------(current)--- ----------(wanted)---
clusterkey clusterkey
--------------------- ---------------------
Other capital a should be top
Starts capital1 Other capital
Starts capital2 Starts capital1
Starts capital3 starts capital1
Starts capital4 Starts capital2
YYYYYYYYYYYY Starts capital3
ZZZZZZZZZZZZ Starts capital4
a should be top YYYYYYYYYYYY
starts capital1 ZZZZZZZZZZZZ
zzzzzzzzzzzz zzzzzzzzzzzz
I understand that we could change the data to be all lowercase / UPPERCASE to sort properly but this would change the look of the data. - Which is definitely not wanted.
Is there an option to change the method of the current clustering order?
- or another way to sort logically?
Clustering columns are sorted in the lexicographic order which means uppercase characters have precedence over lowercase.
Case-insensitive ordering is not supported by Cassandra in current releases but there is an enhancement requested in Cassandra community Provide a locale/collation-aware text comparator so far it didn't get much attention from the developer community.