cassandranosqldatastaxnodetool

Offline compaction/merging of multiple SSTables into one


$ cd /tmp
$ cp -r /var/lib/cassandra/data/keyspace/table-6e9e81a0808811e9ace14f79cedcfbc4 .
$ nodetool compact --user-defined table-6e9e81a0808811e9ace14f79cedcfbc4/*-Data.db

I expected the two SSTables (where the second one contains only tombstones) to be merged into one, which would be equivalent to the first one minus data masked by tombstones from the second one.

However, the last command returns 0 exit status and nothing changes in the table-6e9e81a0808811e9ace14f79cedcfbc4 directory (still two tables are there). Any ideas how to unconditionally merge potentially multiple SSTables into one in the offline manner (like above, not on SSTable files currently used by the running cluster)?


Solution

  • Just nodetool compact <keyspace> <table> There is no real offline compaction, only telling cassandra which sstables to compact. user-defined compaction just is to give it a custom list of sstables and a major compaction (above example) will include all sstables in a table.

    While it really depends on which version your using on if it will work there is https://github.com/tolbertam/sstable-tools#compact available. If desperate can import cassandra-all for your version and do like it : https://github.com/tolbertam/sstable-tools/blob/master/src/main/java/com/csforge/sstable/Compact.java