cassandradata-modelingcqlshsecondary-indexes

Best way to search Cassandra by Non primary key column


I am storing historical timeseries data into Cassandra.

cassandra@cqlsh>CREATE TABLE data."InstrumentTimeSeries" (
    key blob,
    column1 bigint,
    value blob,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
    'keys' : 'ALL',
    'rows_per_partition' : 'NONE'
}
AND compression = {
    'sstable_compression' : ''
}
AND compaction = {
    'class' : 'SizeTieredCompactionStrategy'
};

In column 'value' I am storing whole json after serializing that. Now I want to fetch records on base of some keys from json stored in value column.

  1. I can create a duplicate table by adding new clustering columns as per requirements. or
  2. I can think of Secondary INDEX

I have 20+ million of records in table.

What would be the best way to do that?


Solution

  • If my understanding is correct you store the price data as json in your value column. I think that basically the definition of a high-cardinality column. In this case it is not suggested to index that especially, if you have lots of records.

    You can find further explanation here: https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__when-no-index

    I am not sure what you mean by creating a secondary table, but it could solve your problem. E.g if you plan to fully decompose your json into columns and then use indexing on the relevant ones. It would be easier to understand your problem if you could add some example to your post.