relational-databaserelational-algebra

Project and restrict in relational algebra


The definition of the SELECT and PROJECT operators used below may be found in Chapter 6 of "Relational Database Design and Implementation", 4th Edition, Harrington, Jan L.

The SQL equivalent of the PROJECT (resp. RESTRICT) is SELECT (resp. WHERE with a predicate to reduce the number of elements in a relation). Let us use b_pred for the application of the predicate "pred" to a relation to reduce its elements. Then a(b_pred(relation))=b_pred(a(relation)) iff b_pred does not eliminate the column(s) supporting the predicate "pred". Furthermore if b_pred uses a column which is removed by a, the RHS expression is incorrect.

Is the result always correct when the RESTRICT operation is performed first?

What is a formal proof of that?

Why would we be interested in the opposite order of the operations?

Maybe performance is the only reason.


Solution

  • The two rules that can be applied to change the order of restrictions and projections maintaining the semantics of the expression are the following:

    πYΦX(E)) = σΦXY(E)), if X ⊆ Y

    otherwise, if the condition concerns attributes X ⊈ Y:

    πYΦX(E)) = πYΦXXY(E)))

    where E is any relational expression producing a relation with a set of attributes that includes X and Y, πX(E) is the projection of E over the set of attributes X and σΦX(E) is the restriction over E with a condition ΦX over the set of attributes X.

    These two rules are equivalence rules, so they can be applied in both directions. In general the optimizer tries to apply the restrictions before any other operation, if possible, and than to apply the projections before the joins.

    Added

    The first rule says that if you have a relation with attributes Z = Y ∪ W, performing a restriction over a subset of the attributes of Y, and then projecting the result on Y, is equivalent to perform first the projection, and then the restriction.

    This equivalence can be proved in the following way.

    Given E a relation with attributes Z = Y ∪ W, the definition of restriction is:

    σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

    that is, the set of all the tuples of E such that ΦX(t) is true.

    The definition of projection is:

    πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }

    that is the set of tuples obtained by considering, for each tuple t of E, a (sub)tuple containing only the attributes Y of t.

    So,

    πYΦX(E)) = πY(E') = { t1 | t ∈ E' ∧ Y ⊆ Z ∧ t1 = t[Y] }

    where E' = σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

    Combining these two formulas, we get:

    πYΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) ∧ Y ⊆ Z ∧ t1 = t[Y] }

    But since we know that X ⊆ Y, we can rewrite the formula as:

    πYΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t) ∧ t1 = t[Y] }       [1]

    Starting from the other term,

    σΦXY(E)) = σΦX(E'') = { t | t ∈ E'' ∧ X ⊆ Z ∧ ΦX(t) }

    where E'' = πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }

    Again, combining these two formulas and noting that X ⊆ Y, we get:

    σΦXY(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t1) ∧ t1 = t[Y] }       [2]

    [1] = [2] if we can show that ΦX(t) = ΦX(t[Y]), and this is true since both conditions are true or false at the same time, given that the condition is concerns only the attributes X, which are present both in t and in t[Y] (since X ⊆ Y).

    The second rule says that, if you have a relation with attributes Z = X ∪ Y ∪ W, with X - Y ≠ ∅ performing a restriction over the attributes of X, and then projecting the result on Y, is equivalent to perform first a projection over the attributes X ∪ Y, then perform the restriction, and finally perform a new projection over the attributes X.

    Also in this case a formal proof can be given, by reasoning in an analogous way to the above proof, but it is omitted here for brevity.