C. J. Date (author of the book SQL and Relational Theory) is well-known for criticising SQL's three-valued logic (3VL).
(Date's critique of SQL's 3VL has been criticized by Claude Rubinson (includes the original critique by C. J. Date) who replied in turn.)
Date makes some strong points about why 3VL should be avoided in SQL; however he doesn't outline how a database model would look like if nullable columns weren't allowed.
Example table where we have one nullable column:
+-------------------------------------------+
| People |
+------------+--------------+---------------+
| PersonID | Name | DateOfBirth |
+============+--------------+---------------+
| 1 | Banana Man | NULL |
+------------+--------------+---------------+
Option 1: Emulating NULL
through a flag and a default value:
Instead of making the column nullable, any default value is specified (e.g. 1900-01-01
). An additional BOOLEAN
column will specify whether the value in DateOfBirth
should simply be ignored or whether it actually contains data.
+------------------------------------------------------------------+
| People1 |
+------------+--------------+----------------------+---------------+
| PersonID | Name | IsDateOfBirthKnown | DateOfBirth |
+============+--------------+----------------------+---------------+
| 1 | Banana Man | FALSE | 1900-01-01 |
+------------+--------------+----------------------+---------------+
Option 2: Turning a nullable column into a separate table:
The nullable column is replaced by a new table DatesOfBirth
. If a record doesn't have data for that column, there won't be a record in the new table:
+---------------------------+ 1 0..1 +----------------------------+
| People2 | <-------> | DatesOfBirth2 |
+------------+--------------+ +------------+---------------+
| PersonID | Name | | PersonID | DateOfBirth |
+============+--------------+ +============+---------------+
| 1 | Banana Man |
+------------+--------------+
While this seems like the better solution, this would possibly result in many tables that need to be joined for a single query. Since OUTER JOIN
s won't be allowed (because they would introduce NULL
into the result set), all the necessary data could possibly no longer be fetched with just a single query as before.
What are other options for eliminating NULL
?
I saw Date's colleague Hugh Darwen discuss this issue in an excellent presentation "How To Handle Missing Information Without Using NULL", which is available on the Third Manifesto website.
His solution is a variant on your second approach. It's sixth normal form, with tables to hold both Date of Birth and identifiers where it is unknown:
# +-----------------------------+ 1 0..1 +----------------------------+
# | People' | <-------> | DatesOfBirth |
# +------------+----------------+ +------------+---------------+
# | PersonID | Name | | PersonID | DateOfBirth |
# +============+----------------+ +============+---------------+
# | 1 | Banana Man | ! 2 | 20-MAY-1991 |
# | 2 | Satsuma Girl | +------------+---------------+
# +------------+----------------+
# 1 0..1 +------------+
# <-------> | DobUnknown |
# +------------+
# | PersonID |
# +============+
# | 1 |
# +------------+
Selecting from People then requires joining all three tables, including boilerplate to indicate the unknown Dates Of Birth.
Of course, this is somewhat theoretical. The state of SQL these days is still not sufficiently advanced to handle all this. Hugh's presentation covers these shortcomings. One thing he mentions is not entirely correct: some flavours of SQL do support multiple assignment - for instance Oracle's INSERT ALL syntax.