Cassandra nodetool
has a command called cleanup
:
cleanup [keyspace][cf_name]
Triggers the immediate cleanup of keys no longer belonging to this node. This has roughly the same effect on a node that a major compaction does in terms of a temporary increase in disk space usage and an increase in disk I/O. Optionally takes a list of column family names.
My questions are:
When will a node having keys not belonging to it?
When you have added new nodes to the cluster, decreased replication factor or moved tokens.
When should I issue a cleanup?
After one of the above operations, if you need to save disk space. There is no harm in delaying running it - there is a performance impact and the only reason to is to save disk space.
Should I do cleanup regularly (e.g. once per week)?
No, only if you need to save space after one of the above operations.