database

database primary key


I have a user table where there are hundreds of millions of rows and there's a field username(varchar), should I make it a primary key instead of an unique index? what are the advantages or disadvantages of adding an extra field user_id(int) and make that a primary key instead? I don't see where I would use user_id except say on join condition where a join on int would be faster than join on varchar? or is it?(since both fields are indexed)

update: assume changing username is not an option.


Solution

  • First of all, I second Frederik's comment: I am a firm believer in not ascribing any business or functional value to the primary key of a table. There may not be the option of changing the username now, but perhaps there will be later. Even if not, it's better to get into the habit and be consistent with all of your tables rather than mixing paradigms.

    A secondary reason for using a numeric (or sequential in some way) primary key is insertion and update speed. While this can be changed, by default the primary key on a table is also the clustered index. The clustered index determines the physical order of the rows in the table, so inserting a value out of order causes the database engine to shift all of the rows after it down so that it can insert it in the proper location. With a table with millions of rows, that can be a non-trivial insert or update operation.