cassandradatastaxsnapshot

Cassandra Snapshots and Compaction - are they mutually exclusive?


When we take a snapshot backup in Cassandra, it creates hard links to the existing SSTable files.

When a full compaction happens, the existing SSTable files are merged into new SSTable files.

So, what happens to the snapshot backup previously taken? Is the snapshot backup still valid?

$ cd <PATH TO CASSANDRA DATA DIRECTORY FOR KEYSPACE - MYKS AND TABLE - T1>

$ nodetool snapshot -t first_backup

$ ls -lrt
total 44
drwxr-xr-x 3 cassandra cassandra 4096 Jun 11 02:20 backups
-rw-r--r-- 2 cassandra cassandra  173 Jun 12 03:55 nb-15-big-Index.db
-rw-r--r-- 2 cassandra cassandra   32 Jun 12 03:55 nb-15-big-Filter.db
-rw-r--r-- 2 cassandra cassandra   62 Jun 12 03:55 nb-15-big-Summary.db
-rw-r--r-- 2 cassandra cassandra  776 Jun 12 03:55 nb-15-big-Data.db
-rw-r--r-- 2 cassandra cassandra    9 Jun 12 03:55 nb-15-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra   55 Jun 12 03:55 nb-15-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra 4879 Jun 12 03:55 nb-15-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra   92 Jun 12 03:55 nb-15-big-TOC.txt
drwxr-xr-x 4 cassandra cassandra 4096 Jun 12 03:57 snapshots

$ cd <SNAPSHOT DIRECTORY FOR KEYSPACE - MYKS AND TABLE - T1>

$ ls -lrt
total 44
-rw-r--r-- 2 cassandra cassandra  173 Jun 12 03:55 nb-15-big-Index.db
-rw-r--r-- 2 cassandra cassandra   32 Jun 12 03:55 nb-15-big-Filter.db
-rw-r--r-- 2 cassandra cassandra   62 Jun 12 03:55 nb-15-big-Summary.db
-rw-r--r-- 2 cassandra cassandra  776 Jun 12 03:55 nb-15-big-Data.db
-rw-r--r-- 2 cassandra cassandra    9 Jun 12 03:55 nb-15-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra   55 Jun 12 03:55 nb-15-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra 4879 Jun 12 03:55 nb-15-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra   92 Jun 12 03:55 nb-15-big-TOC.txt
-rw-r--r-- 1 cassandra cassandra 1012 Jun 12 03:57 schema.cql
-rw-r--r-- 1 cassandra cassandra   67 Jun 12 03:57 manifest.json

$ nodetool compact myks

$ cd <PATH TO CASSANDRA DATA DIRECTORY FOR KEYSPACE - MYKS AND TABLE - T1>

$ ls -lrt
total 44
drwxr-xr-x 3 cassandra cassandra 4096 Jun 11 02:20 backups
drwxr-xr-x 4 cassandra cassandra 4096 Jun 12 03:57 snapshots
-rw-r--r-- 1 cassandra cassandra  173 Jun 12 06:24 nb-16-big-Index.db
-rw-r--r-- 1 cassandra cassandra   32 Jun 12 06:24 nb-16-big-Filter.db
-rw-r--r-- 1 cassandra cassandra   62 Jun 12 06:24 nb-16-big-Summary.db
-rw-r--r-- 1 cassandra cassandra  776 Jun 12 06:24 nb-16-big-Data.db
-rw-r--r-- 1 cassandra cassandra    9 Jun 12 06:24 nb-16-big-Digest.crc32
-rw-r--r-- 1 cassandra cassandra   55 Jun 12 06:24 nb-16-big-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 4879 Jun 12 06:24 nb-16-big-Statistics.db
-rw-r--r-- 1 cassandra cassandra   92 Jun 12 06:24 nb-16-big-TOC.txt

## After this compaction, is the above snapshot still valid? Can it be used for a restore?


Solution

  • The snapshot is a backup on a very specific point in time. If you take a snapshot, what happens to the original files afterwards does not affect the snapshot.

    You can use the snapshot to restore back in time to the time the snapshot was taken.

    Hard links allow for data stored once to be accessed at different locations. When compaction runs, it creates a new file and removes the hard link.

    However, there is still a hard link from the snapshot directory which means the file remains unchanged there. I would have a look at how hard links work for a better explanation why it behaves this way. It has to do with how inodes are handled. The data is only erased once all hard-links to this data are removed.