sqldatabase-normalizationfunctional-dependenciessql-null3nf

Mutually exclusive columns and violating 3NF


CREATE TABLE foo
(
   id SERIAL PRIMARY KEY,
   a TEXT,
   b TEXT
)

Column a can take a value only when column b is empty and vice versa:

id | a     | b
---+-------+------
1  | NULL  | 'abc'
2  | 'cba' | NULL

There is a dependency between columns a and b, because we can't just set any value in those columns without knowing what is stored in the other.

Is the table structure above a violation of the third normal form of the database?


Solution

  • According to "Further normalization of the data base relational model" by E. F. Codd, 1971:

    1.2 Functional Dependence. Attribute B of relation R is functionally dependent on attribute A of R if, at every instant of time, each value in A has no more than one value in B associated with it under R.

    3.3 Definition of Third Normal Form. A relation R is in third normal form if it is in second normal form and every non-prime attribute of R is non-transitively dependent on each.

    You have the following rules:

    1. if A is not NULL, then B is NULL;
    2. if B is not NULL, then A is NULL.

    Case if we treat NULL as a value

    You have a constraint that describes a mutual exclusion of column values. So the value of B determines A, and vice versa, depending on the case (#1 or #2). It is a functional dependency (1).

    If we build a full dependency graph of fields, we would have B -> A -> PK (case #1), or A -> B -> PK (case #2).

    As we have more than one dependency in a row, it is a transitive dependency tree.

    So, it is a violation of Third Normal Form (3NF), as stated by (2).

    Case if we don't treat NULL as a value

    As suggested by @Jonas Metzler, NULL is not a value. Thus, we have a plain constraint check not to have both values present.

    Even though B might seem to determine A in this case, and vice versa, it does not determine its state, because NULL is just a marker in this case. It just prohibits use of two column simultaneously.

    There is no 3NF violation in this case.


    It is rather a philosophical question. The answer depends on how we understand what is NULL, and how columns relate.