[SOLVED] Why is a lossy decomposition called lossy?

Why is a lossy decomposition called lossy?

What do we lose in a lossy decomposition?

There is a relation R. It is decomposed into two relation R1 and R2.

If R = (R1 JOIN R2) then it is a lossless join decomposition.

If R is a subset of (R1 JOIN R2) then it is a lossy join decomposition.

In a lossy join decomposition, after joining R1 and R2 we get more records compared to R.

What are we losing?

Solution

Since we are dealing with some instance, R, the relation contains a fixed number of records. It also implicitly contains information about which records do not exist. If the join of R1 and R2 produce extra records, we lose informational integrity.

Suppose you were using the following relation R = (SSN, Name, Address):

           R
SSN     Name      Address
1111    Joe       1 Pine
2222    Alice     2 Oak
3333    Alice     3 Pine

Let R1 = (SSN, Name) and R2 = (Name, Address).

     R1                     R2
SSN     Name     |     Name    Address
1111    Joe      |     Joe     1 Pine
2222    Alice    |     Alice   2 Oak
3333    Alice    |     Alice   3 Pine

The join of R1 and R2 would produce the following table:

       R1 join R2
SSN     Name      Address
1111    Joe       1 Pine
2222    Alice     2 Oak
2222    Alice     3 Pine
3333    Alice     2 Oak
3333    Alice     3 Pine

The information lost in this example is the address for person 2222 and 3333. In the original relation, R, person 2222 lives at 2 Oak. In the join of R1 and R2, person 2222 either lives at 2 Oak or 3 Pine--we no longer have this information.

This is how extra information can result in a lossy decomposition. The records were not lost--what we lost was information about which records were in the original relation.