I have a table for recording license usage. Each license usage needs to be associated with a user and a host machine. The table definition looks like this.
create table if not exists per_user_fact
(
per_user_fact_id int unsigned not null auto_increment,
time_of_day char(16) not null,
license_served_id smallint unsigned not null,
license_hours numeric(10,2) not null,
role_name varchar(64) null,
user varchar(128) not null,
host varchar(128) not null,
primary key (per_user_fact_id),
foreign key (license_served_id) references served_license(served_license_id),
foreign key (user, host) references user_host(username, hostname)
);
I want to normalize this table so that the user/host values, which are repetitive, are moved to a new table like this.
create table if not exists user_host
(
username varchar(64) not null,
hostname varchar(128) not null,
primary key (username, hostname)
);
For the user_host table, what kind of primary key should I pick up - natural or surrogate? I can think of the following governing factors.
Please advise.
I am a big fan of using surrogate primary keys, even in this case. The cost of the additional join is negligible when you are joining to a clustered primary key.
In addition, assuming that username
and hostname
are (together) longer than four or so characters, the surrogate key saves space. In fact, you might find that a surrogate key results in faster queries because the data in per_user_fact
is smaller. Smaller tables occupy fewer data pages, resulting in fewer I/Os.
Another advantage of a surrogate key is the opportunity to change user names and host names without modifying any other tables. If you are using data fields for joins, then modifying the values requires updating multiple tables -- a more cumbersome operation.
I also like surrogate identity/serial/auto increment keys because they also capture insertion order in the table. Of course, there are other methods (my tables typically have a CreatedAt
column which defaults to the insertion time). But, the surrogate key can also play that role.
These reasons do not constitute a "right" answer to the question. There are valid reasons for not using surrogates. For me, though, almost all my tables have such primary keys.