[SOLVED] Cassandra primary key design to cater range query

Cassandra primary key design to cater range query

I have designed a column family

prodgroup text, prodid int, status int, , PRIMARY KEY ((prodgroup), prodid, status)

The data model is to cater

Get list of products from the product group
get list of products for a given range of ids
Get details of a specific product
Update status of the product acive/inactive
Get list of products that are active or inactive (select * from product where prodgroup='xyz' and prodid > 0 and status = 0)

The design works fine, except for the last query . Cassandra not allowing to query on status unless I fix the product id. I think defining a super column family which has the key "PRIMARY KEY((prodgroup), staus, productid)" should work. Would like to get expert advice on other alternatives.

Solution

If you're providing the partition key (group id in this case), then this is an ideal use case for a secondary index. Create a secondary indea on status, and you'll be able to query for exact equality on status. Make sure you provide the partition key, coz if you don't, the query will go to every single node in your cluster, and will likely timeout. If you do provide a partition key (which you seem to be doing), then a secondary index on status should allow you to carry out your query.

There is another issue...you have status as part of the pk. This means that you CAN'T update it for a product. You can only create new rows for different statusses. This is likely not what you need. And if it is, you'll need to ALLOW FILTERING on your last query (which won't be much of a perf hit coz you've filtered to the partition already).

For your use case, this is the schema and query I'd use. I believe it covers all the use cases:

create table products2(
    prodgroup text,
    prodid int,
    status int,
    primary key (prodgroup, prodid)
);

create index on products2 (status);

select * from products2 where prodgroup='groupname' and prodid>0 and status=0;

Since secondary index updates are atomic and managed by cassandra, as long as you're hitting the partition, this'll work nicely.

Hope that helps.