cassandradata-modelingcassandra-3.0datamodelsnappy

Storing a single compressed JSON column than multiple columns in Cassandra?


Purpose of the table is to maintain Audits. Expected Behaviour: Huge writes with infrequent reads and no edits.

Table Definition:

PartitionKey(multiple columns), ClusteringKey(time-uuid), json BLOB;

json will hold Snappy.compressed bytes[]. I am trying to store high frequent large data in this table with least partition size.

What do you think about a single BLOB column that holds compressed JSON or multiple columns in Cassandra?


Solution

  • I think it depends on your usage pattern. Once written, will the JSON data change at all? If it will, you'll end up having to rewrite the entire column value (which will be expensive in terms of temporary disk space used).

    In that case, using individual columns might be the better option. Remember that Cassandra does allow you to read and write while representing the table structure as JSON. So if you end up going the individual column route, that feature allows you to abstract CRUD operations with JSON.

    But if you're just writing once and updates are rare, then a JSON blob should be fine.

    Does Cassandra compress cell values?

    Yes. By default, all created tables will use the LZ4Compressor.

    Is the data stored as a String or some compressed byte[] in the sstables?

    All data in Cassandra is stored (on disk) as a hex byte array.

    Asking this to understand if is worth compressing JSON in app server and storing in Cassandra as BLOB.

    I would not. Basically, you'd compress in the app layer only to have it compressed again at write-time. You should be fine just letting Cassandra handle the compression.