relational-databasedatabase-normalization

Intuition behind 2NF in normalization of relational databases


2NF does not allow a partial dependency; i.e. any non-prime attribute should not be dependent on a subset of the primary key (excluding itself; otherwise it will be full functional dependency).

But why?

What is the problem we have with partial dependencies?

What protocol will it break if we keep it as it is?


Solution

  • William Kent's "A Simple Guide to Five Normal Forms in Relational Database Theory" is a good source to know about. Here's how he describes the problems with a partial dependency.

    Consider the following inventory record:

        ---------------------------------------------------
        | PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS |
        ====================-------------------------------
    

    The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The basic problems with this design are:

    • The warehouse address is repeated in every record that refers to a part stored in that warehouse.
    • If the address of the warehouse changes, every record referring to a part stored in that warehouse must be updated.
    • Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse.
    • If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's address.

    By the way, you said, "any non-prime attribute should not be dependent on subset of primary key"; you should have said something more like "any non-prime attribute should not be dependent on a subset of any candidate key". Most articles and books about relational theory simplify their explanations by assuming there's only one candidate key. But the normal forms are defined in terms of every candidate key.