databasedatabase-normalization

Why is a lossy decomposition called lossy?


Why is a lossy decomposition called lossy?

What do we lose in a lossy decomposition?

There is a relation R. It is decomposed into two relation R1 and R2.

If R = (R1 JOIN R2) then it is a lossless join decomposition.

If R is a subset of (R1 JOIN R2) then it is a lossy join decomposition.

In a lossy join decomposition, after joining R1 and R2 we get more records compared to R.

What are we losing?


Solution

  • Since we are dealing with some instance, R, the relation contains a fixed number of records. It also implicitly contains information about which records do not exist. If the join of R1 and R2 produce extra records, we lose informational integrity.

    Suppose you were using the following relation R = (SSN, Name, Address):

               R
    SSN     Name      Address
    1111    Joe       1 Pine
    2222    Alice     2 Oak
    3333    Alice     3 Pine
    

    Let R1 = (SSN, Name) and R2 = (Name, Address).

         R1                     R2
    SSN     Name     |     Name    Address
    1111    Joe      |     Joe     1 Pine
    2222    Alice    |     Alice   2 Oak
    3333    Alice    |     Alice   3 Pine
    

    The join of R1 and R2 would produce the following table:

           R1 join R2
    SSN     Name      Address
    1111    Joe       1 Pine
    2222    Alice     2 Oak
    2222    Alice     3 Pine
    3333    Alice     2 Oak
    3333    Alice     3 Pine
    

    The information lost in this example is the address for person 2222 and 3333. In the original relation, R, person 2222 lives at 2 Oak. In the join of R1 and R2, person 2222 either lives at 2 Oak or 3 Pine--we no longer have this information.

    This is how extra information can result in a lossy decomposition. The records were not lost--what we lost was information about which records were in the original relation.