cassandranodetool

Is it safe to copy cassandra snapshot files over sstable files in a running node?


Edited after reading nodetool tagged questions.

We take snapshots of our single node cassandra database daily. If I want to restore a snapshot either on that node, or on our staging server which is running a different instance of cassandra, my understanding is I have to:

  1. nodetool disablegossip

  2. nodetool disablebinary

  3. nodetool drain

  4. Copy the sstable files from the snapshot directories to the sstable directories under the keyspace directory.

  5. Run nodetool refresh on each table.

  6. Enable binary & gossip.

Is this sufficient to safely bring the snapshot sstable files in without cassandra overwriting them while I'm doing the refresh?

What is the opposite of nodetool drain?

Another edit: What about sstableloader? Should I use that instead? If so, how? I looked at the "documentation" and am none the wiser.


Solution

  • The steps you outlined isn't quite right. You don't shutdown Cassandra and you shouldn't just copy the files on top of the existing SSTables.

    At a high level, the steps to restore table snapshots on a node are:

    1. TRUNCATE the table you want to restore (will remove the SSTables from the data directories).
    2. Copy the SSTables from data/ks_name/table-UUID/snapshots/snapshot_name subdirectory into the "live" data directory data/ks_name/table-UUID.
    3. Run nodetool refresh -- ks_name table_name.

    You will need to repeat these steps for each application table you want to restore. NOTE: Do NOT restore system tables, only application tables.

    The detailed steps are documented in Restoring from a snapshot in Cassandra.

    To restore a snapshot into another cluster, I prefer to refer to this as "cloning". The procedure for cloning snapshots to another cluster depends on whether the source and destination clusters have identical configuration.

    If both source and destination clusters are identical, follow the steps I documented here -- https://community.datastax.com/questions/4534/. I've explained what identical configuration means in this post.

    If they are not identical, follow the steps I documented here -- https://community.datastax.com/questions/4477/. Cheers!