I searched through the internet a lot and saw a lot of ways to backup and restore a Cassandra cluster, such as nodetool snapshot
and Medusa
. but my question is that can I use dsbulk
to backup a Cassandra cluster. What are its limitations? Why doesn't anyone suggest that?
It's possible to use it in some cases, but it's not practical because (that are primary, list could be bigger):
nodetool snapshot
just create a hardlinks to the files with data, no additional load to the nodeswritetime
function. Plus it will require rescanning of whole data anyway. Plus it's impossible to find what data were deleted. With nodetool snapshot
, you just compare what files has changed since last backup, and backup only them.