Why is a lossy decomposition called lossy?
What do we lose in a lossy decomposition?
There is a relation R. It is decomposed into two relation R1 and R2.
If R = (R1 JOIN R2) then it is a lossless join decomposition.
If R is a subset of (R1 JOIN R2) then it is a lossy join decomposition.
In a lossy join decomposition, after joining R1 and R2 we get more records compared to R.
What are we losing?
Since we are dealing with some instance, R, the relation contains a fixed number of records. It also implicitly contains information about which records do not exist. If the join of R1 and R2 produce extra records, we lose informational integrity.
Suppose you were using the following relation R = (SSN, Name, Address):
R
SSN Name Address
1111 Joe 1 Pine
2222 Alice 2 Oak
3333 Alice 3 Pine
Let R1 = (SSN, Name) and R2 = (Name, Address).
R1 R2
SSN Name | Name Address
1111 Joe | Joe 1 Pine
2222 Alice | Alice 2 Oak
3333 Alice | Alice 3 Pine
The join of R1 and R2 would produce the following table:
R1 join R2
SSN Name Address
1111 Joe 1 Pine
2222 Alice 2 Oak
2222 Alice 3 Pine
3333 Alice 2 Oak
3333 Alice 3 Pine
The information lost in this example is the address for person 2222 and 3333. In the original relation, R, person 2222 lives at 2 Oak. In the join of R1 and R2, person 2222 either lives at 2 Oak or 3 Pine--we no longer have this information.
This is how extra information can result in a lossy decomposition. The records were not lost--what we lost was information about which records were in the original relation.