i want to store every second one value to a table. Therefore i testet two approches against each other. If I have understood correctly, the data should be stored internally almost identical.
Wide-Row
CREATE TABLE timeseries (
id int,
date date,
timestamp timestamp,
value decimal,
PRIMARY KEY ((id, date), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC) AND
compaction={'class': 'DateTieredCompactionStrategy'}
and compression = { 'sstable_compression' : 'DeflateCompressor' };
Skinny Row
CREATE TABLE timeseries(
id int,
date date,
"0" decimal, "1" decimal,"2" decimal, -- ... 86400 decimal values
-- each column index is the second of the day
PRIMARY KEY ((id, date))
)
Test:
In my test the skinny row approach for a sinus function only consums half of the storage for 1 million values. Even the random test is significant. Can somebody explain this behaviour?
The only difference between these schema is the cell key
A sample cell of The wide row model :
["2017-06-09 15\\:05+0600:value","3",1496999149885944]
| | | |
timestamp column value timestamp
And A sample cell of the Skinny row model :
["0","3",1497019292686908]
| | |
column value timestamp
You can clearly see that wide row model cell key is timestamp value and column name of value. And for skinny model cell key is only column name.
The overhead of wide row model is the timestamp(8 bytes) and the size of column name (value).you can keep the column name small and instead of using timestamp, use int and put the seconds of the day, like your skinny row column name. This will save more space.