cassandradatabase-administrationnodetool

When should I run cleanup in Cassandra?


Cassandra nodetool has a command called cleanup:

cleanup [keyspace][cf_name]

Triggers the immediate cleanup of keys no longer belonging to this node. This has roughly the same effect on a node that a major compaction does in terms of a temporary increase in disk space usage and an increase in disk I/O. Optionally takes a list of column family names.

My questions are:

  1. When will a node having keys not belonging to it?
  2. When should I issue a cleanup?
  3. Should I do cleanup regularly (e.g. once per week)?

Solution

  • When will a node having keys not belonging to it?

    When you have added new nodes to the cluster, decreased replication factor or moved tokens.

    When should I issue a cleanup?

    After one of the above operations, if you need to save disk space. There is no harm in delaying running it - there is a performance impact and the only reason to is to save disk space.

    Should I do cleanup regularly (e.g. once per week)?

    No, only if you need to save space after one of the above operations.