cassandrascylla

cassandra - best partition key for a fingerprint


I have a case where I want to store a 32 bytes fingerprint as primary key and just another integer column. At the moment the fingerprint is converted into a hex string and then stored into a cassandra table. I am wondering, however, if it would be better from a partitioning perspective, to use a different key: maybe a blob of twelve bytes? a tuple or a list of bytes? single column?


Solution

  • From partitioning perspective, it doesn't matter if your partition key is a string of hex digits, or blob or several integer columns - in all cases a hash function is used - a Cassandra variant of the Murmur3 hash function - to convert the key's bytes into a 64-bit hash value known as a "token". As all relatively-good hash functions, there is no loss of quality of the hash function if you pass it an highly compressed or non-compressed inputs.

    The benefit of using a smaller key (a blob instead of a string of hex digits - the former is half the length of the latter) is that it will use less memory to store these keys in Scylla's caches. It will also use less disk, although if you use Scylla's compression option, some of the size difference will be reduced. So in general, if you can use a blob, it's better to use it and not convert it to hex bytes. That's why the blob column type exists - to be used when it is suitable.