sqlpostgresqlquery-optimization

Can PostgreSQL optimizer carry `ORDER BY` out of `JOIN` subquery?


There is a "theoretical" query:

SELECT * FROM a
JOIN (
    SELECT b.pk, b.not_pk
    FROM b
    ORDER BY b.not_pk
) AS b2
USING (pk)

and EXPLAIN shows Sort on the whole b.

Can it be carried out of the join (by the optimizer), theoretically? If so, why it isn't?


Solution

  • Can it be carried out of the join, theoretically?

    It cannot.

    If it could, it would have to consider resolving it against any order by the outer query may have. It would also break limit/fetch first in subqueries, or examples like this: demo at db<>fiddle

    select array_agg(x)
    from(select x from t order by x)_;
    

    Which is currently equivalent to this

    select array_agg(x order by x) from t
    

    Unless you're really using it with something order-dependent, sorting inside CTEs and subqueries doesn't make much sense, because the final, outermost, top-level query is free to rearrange its results arbitrarily, or according to a completely different order by.

    From the rewriter/planner/optimiser point of view it could even make sense to completely ignore and discard all inner order bys in the absence of order-dependent features. I don't think it's currently done, and I'd assume that's precisely because someone could reasonably expect that inner sorting to drive something further up, like the array_agg() above.

    In your case, you should see that order by goes to waste because it pulls the data, sorts it, then discards that order and sorts it again by the pk you're joining on, like join operations do.


    I think what you might've had in mind, or what's a logical extension of this idea, is whether the a.pk=b.pk (resulting from using(pk)) condition can be pushed down into the subquery, to reduce the set prior to sorting.

    That would be a good idea and something like this is in fact done by most planners, not just Postgres, and usually referred to as "condition pushdown" which you can see in the example below.

    explain analyze verbose
    SELECT * FROM a
    JOIN (
        SELECT b.pk, b.not_pk
        FROM b
        ORDER BY b.not_pk
    ) AS b2
    USING (pk)
    WHERE b2.not_pk<>0;--this will get pushed down
    
    QUERY PLAN

    Merge Join (cost=41147.43..659605.10 rows=41187641 width=36) (actual time=44.535..44.538 rows=0 loops=1)

      Output: a.pk, b2.not_pk

      Merge Cond: (a.pk = b2.pk)

      -> Sort (cost=5828.76..5979.38 rows=60248 width=32) (actual time=27.691..27.692 rows=1 loops=1)

            Output: a.pk

            Sort Key: a.pk

            Sort Method: external merge Disk: 1648kB

            -> Seq Scan on public.a (cost=0.00..1045.48 rows=60248 width=32) (actual time=0.014..8.607 rows=100000 loops=1)

                  Output: a.pk

      -> Materialize (cost=35318.67..36002.31 rows=136727 width=36) (actual time=16.840..16.841 rows=0 loops=1)

            Output: b2.not_pk, b2.pk

            -> Sort (cost=35318.67..35660.49 rows=136727 width=36) (actual time=16.837..16.838 rows=0 loops=1)

                  Output: b2.not_pk, b2.pk

                  Sort Key: b2.pk

                  Sort Method: quicksort Memory: 25kB

                  -> Subquery Scan on b2 (cost=18204.63..19913.72 rows=136727 width=36) (actual time=16.828..16.829 rows=0 loops=1)

                        Output: b2.not_pk, b2.pk

                        -> Sort (cost=18204.63..18546.45 rows=136727 width=36) (actual time=16.827..16.827 rows=0 loops=1)

                              Output: b.pk, b.not_pk

                              Sort Key: b.not_pk

                              Sort Method: quicksort Memory: 25kB

                              -> Seq Scan on public.b (cost=0.00..2799.68 rows=136727 width=36) (actual time=16.822..16.822 rows=0 loops=1)

                                    Output: b.pk, b.not_pk

                                    Filter: (b.not_pk <> 0) This got pushed down all the way here

                                    Rows Removed by Filter: 200000

    Planning Time: 0.131 ms

    Execution Time: 44.849 ms