sqldatabasepostgresqldb2primary-key-design

Should we use sequences or identities for our primary keys?


We are creating a new database with 20+ tables, and our database supports:

So, the question is: should we use sequences or identities? Which one is better? The team seems to be divided on this one, so I wanted to hear pros and cons, to help decide.

Adding database details:


Solution

  • Your question is about using sequences versus identity ("generated always as identity" columns, presumably). In Postgres, these would be declared as serial. These would always be some sort of number in a single column.

    From the database performance perspective, there is not much difference between the two. One important difference is that some databases will cache identity columns, which can speed inserts but cause gaps. The rules for caching sequences might be different. In a high transaction environment, inadequate caching can be a performance bottleneck. Sharing a single sequence across multiple tables makes this problem worse.

    There is a bigger difference from a data management perspective. A sequence requires managing two objects (the table and the sequence). An identity or serial column is built into the table.

    For a single table, I have only considered using sequences in databases that do not support built-in auto-increment/serial/identity columns (ahem, "Oracle"). Otherwise, I would use the mechanism designed to work with tables.

    I do want to point out that using an auto-incremented surrogate key has other benefits. This should also be the key used for clustering the data, if such a concept exists in the database. New inserts are then always at the "end" (although if you are deleting data, then pages might only be partially used). The primary key should also be the only key used for foreign key references, even if other columns -- in isolation or together -- are unique and candidate primary keys.