databaserelational-algebra

Relational Algebra Intersection Properties


In a test question I asserted

    R ∩ (R ∩ S) = R ∩ S

If this was set theory, I would argue

R ∩ (R ∩ S)
= (R ∩ R) ∩ S by Associativity
= (R) ∩ S by Idempotence
= R ∩ S

With relational algebra, we are instead operating on bags. I can take these same steps (idempotence seems trivial, and associativity is suggested on page 3 of http://comet.lehman.cuny.edu/stjohn/teaching/db/ullmanSlidesF00/slides7.pdf) with bags, but I cannot make it to a formal proof.

How do I prove (or disprove, by counterexample or otherwise) associativity and idempotence of intersection in relational algebra?


Solution

  • The Relational Model is formally defined over sets, so the Relational Algebra is defined only over sets. But since the Relational Databases extended the model to include multisets, or bags, the Relational Algebra has been similarly extended over multisets, and almost all the modern books on Databases describes this extension.

    In particular, using the notation Opb to represent the variant of operator Op over multisets, we can define the operators ∩b, ∪b and −b as follows:

    R ∩b S: if an element t appear n times in R and m times in S, then it appears min(n,m) times in R ∩b S;

    R ∪b S: if an element t appear n times in R and m times in S, then it appears n+m times in R ∪b S;

    R −b S: if an element t appear n times in R and m times in S, then it appears max(0,n-m) times in R −b S.


    You want to show that:

    R ∩b S = R ∩b (R ∩b S)

    As your correct derivation shows, this can be proved if we can prove the Associativity and Idempotence properties for the operator ∩b.

    Idempotence:

    R ∩b R = R

    In this case, since each element appear the same number of times in both the operands, that is m = n, we have that it appears min(n, n) = n times in the result, i.e. it appears the same number of times of R.

    Associativity:

    (R ∩b S) ∩b T = R ∩b (S ∩b T)

    Suppose that a certain element t appears n times in R, m times in S, and k times in T, we have that in:

    (R ∩b S) ∩b T

    it will appear min(min(n,m), k) times in the result, and this is equal to min(n,m,k). On the other hand, in:

    R ∩b (S ∩b T)

    it will appear min(n, min(m,k)) times in the result, but this again is equal to min(n,m,k).

    So this means that it will appear the same number of times in both the results, and so the two results are equal.