I have a 3 node Cassandra v3.0.15
cluster with replication factor 3. My disk size is 2Tb and it's already 95% full. Almost 1.5T of data is considered stale and must deleted and retain only last 2 years data. With the limited free disk space I have how can I delete the data and reclaim disk space?
Other details:
LeveledCompactionStrategy
and 1 table use DateTieredCompactionStrategy
(with max_sstable_age_days = 365
).gc_grace_seconds
= 10 days.I've approaches but I hit roadblock in all of them:
gc_grace_seconds=0
and create an python script will delete the records in batches with pause between each batch.
(i)This could create a lot of tombstones and fill up the remaining disk space.
(ii)Reducing gc_grace_seconds=0
doesn't ensure compation will be triggered.
(iii)There is a table which use DateTieredCompactionStrategy
with max_sstable_age_days = 365
so tables older than 365 days(which is of our interest) will not be compacted.Please suggest safe approach that can be implemented.
If you do have files for user tables which were created > 2 years ago, the option of taking the nodes offline and manually removing the files will work. Once the nodes are back online and a repair is run, a percentage of the data will be reinstated, since the files removed will not have a perfect overlap.
You would need to do this on all the nodes prior to running the repair, otherwise it will reinstate all of the data.
It's not clear from the original post though, if the 1.5 TB you mention is already identified as being in files which you can identify as being > 2 years old or not.
Backing up to external / network storage the files you are going to delete, prior to deleting them, will at least give you a route back - in that you can down the nodes again and add the files back.