databasedatabase-designprimary-keyauto-incrementsurrogate-key

Should primary key be auto_increment?


It is better to use primary key when design tables.

But when designing a primary key, is it needed to set auto_increment?

What's the benefit?

I heard that can keep b-trees stable, but why?

If table has a unique column, which is better: set the unique column as primary key or add a new column id as auto_increment primary key?


Solution

  • I want to know when design a primary key, it is needed to setting auto_increment?

    No, it's not strictly necessary. There are cases when a natural key is fine.

    If done, what's the benefit?

    Advantages of using an auto-increment surrogate key:

    Advantages of using a natural key:

    Other cases when a surrogate auto-increment key is not needed:

    I listen, that can keep b-tree's stable, but i don't know why?

    Inserting a value into an arbitrary place in the middle of a B-tree may cause a costly restructuring of the index.

    There's an animated example here: http://www.bluerwhite.org/btree/

    Look at the example "Inserting Key 33 into a B-Tree (w/ Split)" where it shows the steps of inserting a value into a B-tree node that overfills it, and what the B-tree does in response.

    Now imagine that the example illustration only shows the bottom part of a B-tree that is much deeper (as would be in the case of an index B-tree has millions of entries), and filling the parent node can itself be an overflow, and force the splitting operation to continue up the the higher level in the tree. This can continue all the way to the very top of the tree if all the ancestor nodes to the top of the tree were already filled.

    As the nodes split and have to be restructured, they may require more space, but they're stored on some page of the database file where there's no spare space. So the storage engine has to relocate parts of the index to another part of the file, and potentially re-write a lot of pages of index just for a single INSERT.

    Auto-increment values are naturally always inserted at the very rightmost edge of the B-tree. As @ BrankoDimitrijevic points out in a comment below, this does not make it less likely that they'll cause such laborious node-splitting and restructuring to the index. But the B-tree implementation code can optimize for this case in other ways, and some do.

    If table has a unique column, which's better that set the unique column as primary key or add a new column 'id' as auto_increment primary key?

    If the unique column is also non-nullable, then you can use it as a primary key. Primary keys require that all of their columns are non-nullable.