cassandratime-seriesconsumption

Cassandra Skinny vs Wide Row for time series - consumption


i want to store every second one value to a table. Therefore i testet two approches against each other. If I have understood correctly, the data should be stored internally almost identical.

Wide-Row

CREATE TABLE timeseries (
  id int,
  date date,
  timestamp timestamp,
  value decimal,
  PRIMARY KEY ((id, date), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC) AND
  compaction={'class': 'DateTieredCompactionStrategy'} 
   and  compression = { 'sstable_compression' : 'DeflateCompressor' };

Skinny Row

  CREATE TABLE timeseries(
    id int,
    date date,
    "0" decimal, "1" decimal,"2" decimal, -- ... 86400 decimal values
                   -- each column index is the second of the day
    PRIMARY KEY ((id, date))
) 

Test:

results of the comparison in values


enter image description here

In my test the skinny row approach for a sinus function only consums half of the storage for 1 million values. Even the random test is significant. Can somebody explain this behaviour?


Solution

  • The only difference between these schema is the cell key

    A sample cell of The wide row model :

    ["2017-06-09 15\\:05+0600:value","3",1496999149885944]
              |                 |     |          |
           timestamp         column  value   timestamp
    

    And A sample cell of the Skinny row model :

       ["0","3",1497019292686908]
         |   |          | 
      column value   timestamp
    

    You can clearly see that wide row model cell key is timestamp value and column name of value. And for skinny model cell key is only column name.

    The overhead of wide row model is the timestamp(8 bytes) and the size of column name (value).you can keep the column name small and instead of using timestamp, use int and put the seconds of the day, like your skinny row column name. This will save more space.