databasekeyrelational-databasedatabase-normalizationbcnf

BCNF: Looking for example that actually uses superkey instead of candidate key


The definition of the Boyce–Codd normal form states that the determinants of all non-trivial functional dependencies have to be superkeys.

All the examples for relations in BCNF I found make use of candidate keys. I am looking for an example that actually has a superkey as determinant which is not a candidate key.

I fail to come up with a relation that only uses superkeys which can't be transformed to use candidate keys.

Let's say we have a relation with an candidate key and an additional functional dependency with a superkey as determinant.

R1(A,B,C)
{A}
A,B -> C

This additional FD is redundant because it contains an candidate key that obviously detemines the other attribute (A -> C).

Trying to build another example with two candidate keys is also useless.

R2(A,B,C,D)
{A,B},{B,C}
A,B,C -> D

This has the exact same problem as above.

I am actually wondering if there even is an example without candidate keys. But why would the definition be broader than necessary? Or are the definitions equivalent as the dependencies can always be transformed?


Solution

  • The point is that, when defining a normal form, we must express it in a general form, as a property of all the functional dependencies holding on a certain relation.

    Instead, when we reason about a particular relation schema, we usually have only a subset of all the functional dependencies (since their number can be too large, being possibly exponential with the number of attributes). The particular set of dependencies used, denoted usually by the letter F, has a special property: it is a cover of all the dependencies holding in the relation, that is from it we can derive all the dependencies of the relation by applying, in all the possible ways, a set of axioms, called Armstrong’s axioms.

    F, the set of dependencies specified together with the attributes in a relational schema, can be given in different ways: for instance in an exercise they can be given as input to the exercise, in real database design they can describe a set of constraints considered important for modelling a certain real-word domain, etc.

    Even if they are extracted from the knowledge about a situation to be modeled through a database, they could contain dependencies implied by others already given, or maybe can contain redundant attributes, etc.

    For these reasons, it is considered an important first step in normalization to find the canonical cover of the set of dependencies given, that is a cover constituted by a set of dependencies that: a) have only a single attribute on the rigth part; b) do not have superflous attributes on the left part (i.e. attributes that can be removed maintaing the property of being a cover); c) do not have redundant dependencies (i.e. dependencies that can be derived from others through the Armstrong’s axioms).

    Now let’s consider the general definition of BCNF:

    A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F+, X is a superkey.

    Note that the we are talking about the dependencies in F+, which is the closure of F, in other words, which contains all the dependencies holding in R and derived in some way from F. So if the relation R has a candidate key XK, obviously not only XK → T holds, for instance, but for all the supersets S of XK we will have that S → T holds, and so the definition of the normal form must allow those dependencies.

    Now, it is possible to prove, from the general definition of BCNF, the following theorem, that in some way simplifies it (and makes efficient the test to check if a relation is already in BCNF):

    Theorem: A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F, X is a superkey.

    See the difference? We are now talking about F and not F+.

    And this theorem has the following corollary:

    Corollary: A relation schema R<T,F> in which F is a canonical cover, is in BCNF if and only if for each non-trivial dependency X → A of F, X is a candidate key.

    Since the dependencies in a canonical cover do not have superfluous attributes, if the relation is in BCNF every determinant (left hand side of a functional dependency) is obviously a candidate key (not a generic superkey), and this explain the difference between the definition and the examples that sometimes we found on the books.