database-designrelational-databasedatabase-normalizationfunctional-dependencies3nf

Functional dependencies after normalization


The exercise from my schoolbook asks us to normalize a relation to 2NF and then to 3NF.

R (ABCDEF)
F = {AD->FE, BC->E, FEA->D, AC->DE, F->E, BD->A, F->C, ABC->AEF, B->F}

Following the book, I minimized the functional dependencies to

F = {AD->F, AC->D, BD->A, F->E, F->C, B->F}

Now we can find the keys:

{AB} and {BD}

Next,

B->F

breaks 2NF, since a non-key attribute depends on a part of a key. So we divide our relation into two new tables:

R1 (BFEC) and R2 (ABD)

Here I get lost. Those new tables need to inherit the functional dependencies, so I did this:

F1 = {F->E, F->C, B->F} and F2 = {BD->A}

I was trying to base which relation gets which dependency on the attributes and pairing them together. However, that leaves two functional dependencies from the original set:

AD->F, AC->D

How do I figure out the new functional dependencies for each new table that I create in the process of normalization?


Solution

  • Suppose we want to losslessly decompose so that an FD (functional dependency) that holds in an original relation also holds in one of the components and so holds in their join. We can do this using the FD's attributes for one component while getting another component by dropping the determined attributes from the original, as long as the components share the attributes of a CK (candidate key) of one of them. You will have been told that that is so. Here for problematic FD B -> F that suggests decomposition {ABCDE} & {BF}, since common attribute set {B} is a CK of the latter.

    So we need to "divide" our relation into two new tables : R1 = (BFEC) and R2 = (ABD)

    That is not a lossless decomposition given these FDs, and it's not clear why you think it is. (The shared columns {B} don't include a CK of one of those components.)

    (The term is not "divide", it is "decompose". It's not clear why instead of using the right term you used a different term that doesn't mean that, while even using scare quotes to show that you know it doesn't mean that, while not even saying what you do mean by it.)

    You can't just take the subset of FDs in a cover that hold in a component as a cover for the component. You need to see which of all the original FDs that hold have the component's attributes & so hold in the component. Why? A cover is a set of FDs that imply all the FDs that hold & no others. There might be a FD holding in the component that isn't in the cover but is implied by sets of FDs that are. If all those sets include FDs whose attributes aren't in the component then that FD won't be implied by the subset of FDs of the cover that have attributes in the component. So that subset won't be a cover for the component.

    Decomposing by splitting off some FD as described above might mean that some non-trivial FD that holds in the original doesn't have all its attributes in either component and so doesn't hold in them. (Notice also that the FDs that hold include all the ones implied by the listed ones, given by Armstrong's axioms, not just the listed ones.) Then although we would have a lossless decomposition, we couldn't check that the latter FD holds in the join of the components by checking that it holds in the components. This is called not preserving that FD. But it turns out that we can always normalize to 3NF while preserving FDs. That is why we don't normalize to a NF (normal form) by going through lower NFs or by randomly picking FDs, but instead use an algorithm specially designed to get to that NF while preserving FDs.

    Find & follow an information modelling & relational database textbook. (Dozens are free online, also academic slides & courses.)

    Here is Bernstein's FD-preserving 3NF "synthesis" algorithm from some slides by Ullman:

    Given: a relation R and a cover F for the FDs that hold in R

    1. Find C, a canonical cover for F
    2. For each FD X -> Y in C, create a relation with schema XY
    3. Eliminate a relation if its schema is a subset of another
    4. If none of the schemas created so far contains a CK of R, add a relation schema containing a CK of R

    Since this preserves FDs & 3NF implies 2NF, its output is also a 2NF schema with FDs preserved. (Its output is actually in EKNF, slightly stronger that 3NF.)