cassandrascylla

Can skewness happen for r/w operations on scylla/cassandra cluster if phone number is used as primary key for the table?


I am running a spark job which populates a table in scylla. The primary key in the table is of long type and it essentially contains phone numbers. If we talk about potential values of this field, it will definitely not be uniform since most of the phone numbers are concentrated between 9-12 digits.

As for my scylla configuration, its a single cluster with 3 nodes with replication factor as 3. My question is will read/write operations be skewed per node since primary key itself is skewed or will scylla choose the node after hashing the primary key which will make operations uniform?

I have seen that number of operations sometimes jump for a particular node but I want to be 100% sure before taking any steps.


Solution

  • The node for a partition is determined by the Murmur3 hash of the primary key.

    For example, the phone number 920-458-3834 gets hashed to a token of 3485763808729355786, and ends up being written to the node responsible for a token range that includes 3485763808729355786.

    Maybe the next phone number is similar to that one, say 920-458-3835. This gets hashed as -6305759902789073081, and in a large cluster will likely be written to a different node.

    seen that number of operations sometimes jump for a particular node

    Not sure about the cluster or app setup, but it's possible that this could be a coordinator node.