sqlpostgresql

SQL statement evaluation order


I'm learning PostgreSQL and want to understand the exact order in which SQL statements are evaluated. After researching, I found the following evaluation order:

 1. FROM 
 2. ON 
 3. JOIN 
 4. WHERE 
 5. GROUP BY 
 6. HAVING
 7. SELECT 
 8. AS  
 9. DISTINCT 
 10. ORDER BY 
 11. LIMIT / OFFSET   

Based on this order, it seems that when I create an alias, I can't use it in the clauses that are evaluated before the SELECT statement. However, I'm confused about how this works with subqueries. For example:

SELECT first_name, last_name
    FROM customer AS c
    WHERE EXISTS(
    (SELECT * FROM payment AS p
    WHERE p.customer_id = c.customer_id
    AND amount > 11)
    );

How can the subquery use the alias c and, how does the subquery get access to the alias c before it is created?

Could someone provide insights how does this work?

Thank you in advance for the help!


Solution

  • The logical order of standard ISO SQL clauses for the SQL SELECT statement is :

    1. FROM
    2. WHERE
    3. GROUP BY
    4. HAVING
    5. WINDOW
    6. SELECT
    7. ORDER BY

    All other keyword are subclauses (JOIN / ON, DISTINCT, OFFSET / FETCH...) or operators (IN, BETWEEN, EXISTS...) depending on SQL clauses.

    Contrary to what @Laurenz_Albe says, the AS keyword introduces an alias, that is to say a new name, for a column or a table, which can be used in the clauses located after it, in the logical order. Thus an AS defining an alias for a column of the SELECT clause can only be used in the ORDER BY clause and in no case in other of the following elements of the SELECT clause.

    So :

    SELECT COL1 AS X, X + 1
    

    Must throw an exception.

    On the other hand, a table alias can be used in any other clause, including following the sequence of elements of the FROM clause in which it is defined.

    So :

    SELECT *
    FROM   Tab1 AS T
           JOIN Tab2 AS U
              ON T.id = U.id
    

    ...is correct

    The query givent by @Laurenz_Albe :

    SELECT t.c AS x
    FROM (VALUES (1), (2)) AS t(c)
    WHERE EXISTS (SELECT WHERE t.c = 1);
    

    ...is syntaxically false in SQL and cannott be accepted by any RDBMS, except PostGreSQL. Because the inner SELECT clause must have almost an element. A correct query can be :

    SELECT t.c AS x
    FROM (VALUES (1), (2)) AS t(c)
    WHERE EXISTS (SELECT * WHERE t.c = 1);
    

    Avoid to write such stupid query that is syntaxically innaceptable...

    By the way,

    kewords ON and PARTITIONNED depends of a JOIN, that depends of the FROM clause, like LATERAL { CROSS |OUTER } APPLY.

    keyword DISTINCT wich is an operator depend of the FROM clause and must preceed any element of this clause

    keyword LIMIT is not a part of the ISO SQL standard.

    Keyword OFFSET depend of the ORDER BY clause and FETCH depend of OFFSET.

    But do not make confusion between the logical order and the physical treatment of the query. A query is just a sentence translated into a mathematical formulae of relational algebra and this formulae can be:

    As a basic example of simplification :

    SELECT *
    FROM   T
    WHERE  Col1 + 3 = 5
    

    would be rewrited as :

    SELECT *
    FROM   T
    WHERE  Col1 = 2
    

    A more sophisticated simplification by rewriting can be :

    SELECT COUNT(*), PrimaryKeyCol, Col2, Col3, Col4, Col5 
    FROM   T
    GROUP  BY PrimaryKeyCol, Col2, Col3, Col4, Col5 
    

    ...that would be rewrited as ::

    SELECT COUNT(*), PrimaryKeyCol, Col2, Col3, Col4, Col5 
    FROM   T
    GROUP  BY PrimaryKeyCol 
    

    Depending of the level of "intelligence" of the optimizer (stupidly called planer in PostGreSQL)...