cassandranodetool

what does nodetool garbagecollect is actually doing


I'm trying to free some disk space in C*.
I've deleted many rows which created many tombstones.
I'm running nodetool garbagecollect and was wondering what this tool is doing behind the scens. I've read that it deletes the actual data that the tombstone is shadowing but not the tombstones (which will be cleared after gc_grace_seconds). Is that accurate? the garbagecollect tool does not have any correlation with the gc_grace_seconds parameter? How does the garbagecollect actually releases disk space?

there is not a lot of documentation on how this tool works and what it does.

any help will be much appreciated


Solution

  • Deletion of data in Cassandra is always adding more data so you need be careful with that.

    nodetool garbagecollect performs single-sstable compactions to remove overwritten or logically deleted data. For each sstable, it will create a new sstable with unneeded data cleaned out. By default, garbagecollect removes rows or partitions that have been deleted or updated with newer data. It may also remove deleted or updated cell values if the -g CELL option is specified, but this will require more resources (I/O CPU). This command may also remove expired tombstones (older than gc_grace_seconds), but not the fresh ones. Plus there are also other limitations on the removal of tombstones.

    If the expired tombstones are still exist, then the only major compaction may help to evict them, for example, by running nodetool compact -s on the individual tables, but you need to make sure that you have enough space - the same size as a table itself.