primary-keyauto-incrementsurrogate-key

nature key vs auto_increment key as the primary key


My problem is about nature key and auto_increment integer as primary key.

For example, I have tables A and B and A_B_relation. A and B may be some object, and A_B_realtion record the many to many relation of A and B.

Both A and B have their own global unique id, such as UUID. The UUID is available to user, this means user may query A or B by UUID.

There are two ways to design the table's primary key.

  1. use the auto_increment integer. A_B_relation reference the integer as FK.
  2. use the UUID. A_B_relation reference the UUID as FK.

For example, user want to query all the B's info associate with A by A's UUID.

For the first case, the query flow is this:

First, query A's integer primary key by UUID from `A`.

And then, query all the B's integer primary key from `A_B_relation`.

At last, query all the B's info from `B`.

For the latter case, the flow is as below:

Query all the B's UUID from the `A_B_relation` by A's UUID.

Query all the B's info from `B`.

So I think, the latter case is more convenient. Is this right? what's the shortage of the latter case?


Solution

  • According to my opinion convenience of using either natural key of auto-increment key depends on the program solution you are providing. Both methods have pros and cons. So the best solution is to understand both key types properly, analyze what kind of business solution you are trying to provide and select the appropriate primary key type.

    Natural key is a column or a set of columns which we can be used to uniquely identify a record in a table. These columns contain real data which has a relationship with the rest of the columns of the table.

    Auto-incremented key, also called as surrogate key is a single table column which contains unique numeric values which can be used to uniquely identify a single row of data in a table. These values are generated at run-time when a record is inserted to the table and has no relationship with the rest of the data of the row.

    The main advantage of using Natural keys is it has it's own meaning and requires less joins with other tables where as if we used a surrogate key we would require to join to a foreign key table to get the results we got with the natural key.
    But say we cannot get all the data required from single table and have to join with another table to get all the data required. Then it is convenient to use a surrogate key instead of natural key because most of the time natural keys are strings and larger in size than surrogate keys and it will take more time to join tables using larger values.

    A natural key has it's own meaning. So when it comes to searching records it is more advantageous to use natural keys over surrogate keys. But say with time our program logic changes and we have to change the natural key value. This will be difficult and will cause a cascade effect over all foreign key relationships. We can overcome this problem using a surrogate key. Since a surrogate key does not have a relationship with the rest of the values of a row, changes of the logic won't have a affect over the surrogate key.

    Likewise, as I see the convenience and inconvenience of using a surrogate key or a natural key entirely base on the solution you are providing.