As we know from Cassandra's documentation[Link to doc] that partitioner should be such that the data is distributed evenly across multiple nodes to avoid read hotspots. Cassandra offers various partitioning algorithms for that - Murmur3Partitioner, RandomPartitioner, ByteOrderedPartitioner .
Murmur3Partitioner is the default Partitioning Algorithm set by Cassandra. It hashes the partition key and converts into the hash values ranges from -2^63 to +2^63-1. My query here is, we have different data sets which has different partition key. For example, one can set partition key with uuid type data, other can set first name and last name as partitioning key, other can set timestamp as their partitioning key and one can also set city name in partitioning key.
Now assume a data set with city as partitioning key, let's say
Node 1 stores Houston data
Node 2 stores Chicago data
Node 3 stories Phoenix data and so on...
And our data gets more entries of data with Chicago city at one instant of time, then Node 2 will have maximum records of our database and there will be hotspots in that case. In this scenario how will Cassandra manage to evenly distribute data across these nodes?
In short - it doesn't. It is a deterministic hash function with the partitioner, so the same value will result in the same hash value each time and position on the ring. If you design a data model where 80% of the data has the same partition key, then 80% of the data will sit on 3 nodes (assuming RF 3).
Using partition keys with a high cardinality prevent this by the fact that they will hash to so many different values and locations in the ring. Using a partition key value such as city, which is a relatively low cardinality value, is not a good partition key in any scenario beyond a very small dataset.
The onus is on the developer to design a data model which uses suitable high cardinality values for the partition key on the larger data sets to avoid hotspots.