The definition of the SELECT and PROJECT operators used below may be found in Chapter 6 of "Relational Database Design and Implementation", 4th Edition, Harrington, Jan L.
The SQL equivalent of the PROJECT (resp. RESTRICT) is SELECT (resp. WHERE with a predicate to reduce the number of elements in a relation). Let us use b_pred for the application of the predicate "pred" to a relation to reduce its elements. Then a(b_pred(relation))=b_pred(a(relation)) iff b_pred does not eliminate the column(s) supporting the predicate "pred". Furthermore if b_pred uses a column which is removed by a, the RHS expression is incorrect.
Is the result always correct when the RESTRICT operation is performed first?
What is a formal proof of that?
Why would we be interested in the opposite order of the operations?
Maybe performance is the only reason.
The two rules that can be applied to change the order of restrictions and projections maintaining the semantics of the expression are the following:
πY(σΦX(E)) = σΦX(πY(E)), if X ⊆ Y
otherwise, if the condition concerns attributes X ⊈ Y:
πY(σΦX(E)) = πY(σΦX(πXY(E)))
where E is any relational expression producing a relation with a set of attributes that includes X and Y, πX(E) is the projection of E over the set of attributes X and σΦX(E) is the restriction over E with a condition ΦX over the set of attributes X.
These two rules are equivalence rules, so they can be applied in both directions. In general the optimizer tries to apply the restrictions before any other operation, if possible, and than to apply the projections before the joins.
Added
The first rule says that if you have a relation with attributes Z = Y ∪ W, performing a restriction over a subset of the attributes of Y, and then projecting the result on Y, is equivalent to perform first the projection, and then the restriction.
This equivalence can be proved in the following way.
Given E a relation with attributes Z = Y ∪ W, the definition of restriction is:
σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }
that is, the set of all the tuples of E such that ΦX(t) is true.
The definition of projection is:
πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }
that is the set of tuples obtained by considering, for each tuple t of E, a (sub)tuple containing only the attributes Y of t.
So,
πY(σΦX(E)) = πY(E') = { t1 | t ∈ E' ∧ Y ⊆ Z ∧ t1 = t[Y] }
where E' = σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }
Combining these two formulas, we get:
πY(σΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) ∧ Y ⊆ Z ∧ t1 = t[Y] }
But since we know that X ⊆ Y, we can rewrite the formula as:
πY(σΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t) ∧ t1 = t[Y] } [1]
Starting from the other term,
σΦX(πY(E)) = σΦX(E'') = { t | t ∈ E'' ∧ X ⊆ Z ∧ ΦX(t) }
where E'' = πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }
Again, combining these two formulas and noting that X ⊆ Y, we get:
σΦX(πY(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t1) ∧ t1 = t[Y] } [2]
[1] = [2] if we can show that ΦX(t) = ΦX(t[Y]), and this is true since both conditions are true or false at the same time, given that the condition is concerns only the attributes X, which are present both in t and in t[Y] (since X ⊆ Y).
The second rule says that, if you have a relation with attributes Z = X ∪ Y ∪ W, with X - Y ≠ ∅ performing a restriction over the attributes of X, and then projecting the result on Y, is equivalent to perform first a projection over the attributes X ∪ Y, then perform the restriction, and finally perform a new projection over the attributes X.
Also in this case a formal proof can be given, by reasoning in an analogous way to the above proof, but it is omitted here for brevity.