mysqldatabasenullrelational-model

Unique key with NULLs


This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000

If I now try to run the following statement, it should and will fail:

INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')

If I try this one, it will succeed:

INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')

And now my data will look like this:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000
 5 Jim Johnson NULL          Office Manager  40,000

This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,

{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN

My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him."

My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.


Solution

  • A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.

    There are two possible solutions to your problem: