cassandra

I want to do capacity planning for our cassandra cluster. I have the below questions


I want to do capacity planning for our cassandra cluster. I have the below questions.

Q1: We have around 10 TB data and having 5 nodes in cassandra cluster (4.1) , I am curious to know about the storage I should assign to the cluster in prod environment.

Q2: What is the best approach to add space on cluster in case of space crunch.

Thanks.

.


Solution

  • If by 10 TB you mean 10TB represents a single replica, then on RF=3, you have 30 TB to split among the 5 nodes - 6 TB each.

    This is already likely to be too much and the use of STCS is potentially suspect. Nothing about the data lifecycle has been mentioned, and the data will no doubt need to be deleted, either through an actual deletion or a TTL. STCS will suffer at this size with that, resulting in there being a lot of large files not being recompacted for long durations.

    Under STCS you would potentially have to provision double the space of the data, so at 6 TB per node, you would in theory need to provision 12 TB per node - but this is far too simplistic of an approach to take.

    I would consider investigating more details before making any decisions:

    You need to provide substantially more context to the scenario before any recommendation is going to stand up to scrutiny - the schema matters, the usage patterns matter - you can not take a total data size / number of servers alone and make capacity decisions.