In my web application, the user can define documents and give them a unique name that identifies that document and a friendly name that a human will use to refer to the document. Take the following table schema as an example:
| id | name | friendly_name |
-----------------------------------------------
| 2 | invoice-2 | Invoice 2 |
In this example I've used the id
column as the primary key, which is an auto incrementing number. Since there's already a natural ID for documents (name
) I could also do this:
| name | friendly_name |
--------------------------------------
| invoice-2 | Invoice 2 |
In this example, name
is the primary key of the document. We've eliminated the id
field as it's essentially just a duplicate of name
, since every document in the table must have a unique name
anyway.
This would also mean that when I refer to a document from a foreign key relationship I'd have to call it document_name
rather than document_id
.
What's the best practice regarding this? Theoretically it's entirely possible for me to use a VARCHAR
for the primary key, but does it come with any downsides such as performance overhead?
There are two schools of thought on this topic.
There are some who hold strongly to the belief that using a "natural key" as the primary key for an entity table is desirable, because it has significant advantages over a surrogate key.
The are others that believe that a "surrogate" key can provide some desirable properties which a "natural" key may not.
Let's summarize some of the most important and desirable properties of a primary key:
(There are some other properties that can be listed, but some of those properties can be derived from the properties above (not null, can be indexed, etc.)
I break the two schools of thought regarding "natural" and "surrogate" keys as the "best" primary keys into two camps:
1) Those who have been badly burned by an earlier decision to elect a natural key as the primary key, and
2) Those who have not yet been burned by that decision.