cockroachdb

Does column order of multi-column primary keys matter in CockroachDB?


CRDB documentation for secondary indexes clearly states column order matters:

Columns with a higher cardinality (higher number of distinct values) should be placed in the index before columns with a lower cardinality. If the cardinality of the columns you wish to add to the index are similar, test multiple column arrangements in a non-production environment to determine the most performant arrangement.

Does this guidance also apply to the composition of multi-column primary keys?


Solution

  • Yes, I would think so. IIUC, the main point to constructing a good multi-column primary key in CRDB is to ensure that the prefix to the primary key is unique enough so that data will be distributed "evenly enough". I believe that's what the docs here (https://www.cockroachlabs.com/docs/v23.2/performance-best-practices-overview#use-multi-column-primary-keys) are trying to get at below:

    By "enough randomness" we mean that the prefix of the primary key should be relatively uniformly distributed over its domain. Its domain should have at least as many elements as you have nodes.

    The caveat to this is that it doesn't look/sound like ordering a multi-column pk by cardinality is the "one and only rule" to constructing the best multi-column pk — if that is what you're asking.

    Out of curiosity, we can also test this theory out by doing something like:

    CREATE TABLE t (low BOOLEAN NOT NULL, very_high UUID DEFAULT gen_random_uuid() NOT NULL); 
    
    INSERT INTO t (low)                                                           
    SELECT i % 2 = 0                                                              
    FROM generate_series(1, 10000) AS s(i);                                          
    

    Following higher cardinality first rule:

    ALTER TABLE t ALTER PRIMARY KEY USING COLUMNS (very_high, low); 
    

    Not following rule:

    ALTER TABLE t ALTER PRIMARY KEY USING COLUMNS (low, very_high);
    

    I think that should roughly demonstrate it? Not sure the magnitude of rows to which this would best be represented