Project and restrict in relational algebra

The definition of the SELECT and PROJECT operators used below may be found in Chapter 6 of "Relational Database Design and Implementation", 4th Edition, Harrington, Jan L.

The SQL equivalent of the PROJECT (resp. RESTRICT) is SELECT (resp. WHERE with a predicate to reduce the number of elements in a relation). Let us use b_pred for the application of the predicate "pred" to a relation to reduce its elements. Then a(b_pred(relation))=b_pred(a(relation)) iff b_pred does not eliminate the column(s) supporting the predicate "pred". Furthermore if b_pred uses a column which is removed by a, the RHS expression is incorrect.

Is the result always correct when the RESTRICT operation is performed first?

What is a formal proof of that?

Why would we be interested in the opposite order of the operations?

Maybe performance is the only reason.

Solution

The two rules that can be applied to change the order of restrictions and projections maintaining the semantics of the expression are the following:

π_Y(σ_ΦX(E)) = σ_ΦX(π_Y(E)), if X ⊆ Y

otherwise, if the condition concerns attributes X ⊈ Y:

π_Y(σ_ΦX(E)) = π_Y(σ_ΦX(π_XY(E)))

where E is any relational expression producing a relation with a set of attributes that includes X and Y, π_X(E) is the projection of E over the set of attributes X and σ_ΦX(E) is the restriction over E with a condition ΦX over the set of attributes X.

These two rules are equivalence rules, so they can be applied in both directions. In general the optimizer tries to apply the restrictions before any other operation, if possible, and than to apply the projections before the joins.

Added

The first rule says that if you have a relation with attributes Z = Y ∪ W, performing a restriction over a subset of the attributes of Y, and then projecting the result on Y, is equivalent to perform first the projection, and then the restriction.

This equivalence can be proved in the following way.

Given E a relation with attributes Z = Y ∪ W, the definition of restriction is:

σ_ΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

that is, the set of all the tuples of E such that ΦX(t) is true.

The definition of projection is:

π_Y(E) = { t₁ | t ∈ E ∧ Y ⊆ Z ∧ t₁ = t[Y] }

that is the set of tuples obtained by considering, for each tuple t of E, a (sub)tuple containing only the attributes Y of t.

So,

π_Y(σ_ΦX(E)) = π_Y(E') = { t₁ | t ∈ E' ∧ Y ⊆ Z ∧ t₁ = t[Y] }

where E' = σ_ΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

Combining these two formulas, we get:

π_Y(σ_ΦX(E)) = { t₁ | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) ∧ Y ⊆ Z ∧ t₁ = t[Y] }

But since we know that X ⊆ Y, we can rewrite the formula as:

π_Y(σ_ΦX(E)) = { t₁ | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t) ∧ t₁ = t[Y] } [1]

Starting from the other term,

σ_ΦX(π_Y(E)) = σ_ΦX(E'') = { t | t ∈ E'' ∧ X ⊆ Z ∧ ΦX(t) }

where E'' = π_Y(E) = { t₁ | t ∈ E ∧ Y ⊆ Z ∧ t₁ = t[Y] }

Again, combining these two formulas and noting that X ⊆ Y, we get:

σ_ΦX(π_Y(E)) = { t₁ | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t₁) ∧ t₁ = t[Y] } [2]

[1] = [2] if we can show that ΦX(t) = ΦX(t[Y]), and this is true since both conditions are true or false at the same time, given that the condition is concerns only the attributes X, which are present both in t and in t[Y] (since X ⊆ Y).

The second rule says that, if you have a relation with attributes Z = X ∪ Y ∪ W, with X - Y ≠ ∅ performing a restriction over the attributes of X, and then projecting the result on Y, is equivalent to perform first a projection over the attributes X ∪ Y, then perform the restriction, and finally perform a new projection over the attributes X.

Also in this case a formal proof can be given, by reasoning in an analogous way to the above proof, but it is omitted here for brevity.