sqlpostgresqlfunctionpostgresql-performancelateral-join

Performance loss introducing lateral join in Postgres query


I am dealing with a PostgreSQL (v14) query of this kind

SELECT
    ...,
    EXISTS (
        SELECT 1
        FROM table2 t2
        WHERE t2.fk = t1.id
          AND LOWER(t2.table_name) = 't1'
    ) AS t2_record_exists
FROM table1 t1;

and was hoping to extract the logic in a function to be used in lateral join, in order to compute that field in a more readable way (as it is to be used in a view).

This is the resulting query

SELECT
    ...,
    t2_record_exists.t2_record_exists
FROM table1 t1
    LEFT JOIN LATERAL does_t2_record_exist(t1.id, 't1') t2_record_exists(t2_record_exists) ON TRUE;

That makes use of this function

CREATE OR REPLACE FUNCTION does_t2_record_exist(object_id int8, _table_name text)
    RETURNS bool
    LANGUAGE plpgsql
AS $function$
BEGIN
    RETURN EXISTS (
        SELECT 1
        FROM table2 t2
        WHERE t2.fk = object_id
            AND LOWER(t2.table_name) = _table_name
    );
END $function$;

The second query suffers a severe performance loss, as it executes in about 6000ms, while the first gets it done in 300ms.

I don't know why this would happen, as I naively assumed that the very same operation (the EXISTS subquery) would be executed the same amount of times (once per row).

What is going wrong here? How can one foresee such performance issues beforehand?

EDIT: Here are the query plans (obtained with EXPLAIN ANALYZE) for my specific case

Query 1 (with subquery):

Index Only Scan using cos_table1 on table1 t1 (cost=0.43..3723311.61 rows=1481535 width=9) (actual time=56.299..247.674 rows=1477585 loops=1)
  Heap Fetches: 46760
  SubPlan 2
    ->  Seq Scan on table2 t2  (cost=0.00..15.27 rows=2 width=8) (actual time=11.300..11.454 rows=372 loops=1)
          Filter: (lower(table_name) = 't1'::text)
          Rows Removed by Filter: 113
Planning Time: 0.085 ms
JIT:
  Functions: 13
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.769 ms, Inlining 17.544 ms, Optimization 24.557 ms, Emission 13.652 ms, Total 56.524 ms
Execution Time: 276.285 ms

Query 2 (with LATERAL JOIN):

Nested Loop Left Join  (cost=0.68..56512.74 rows=1481535 width=9) (actual time=0.039..5978.865 rows=1477585 loops=1)
  ->  Index Only Scan using cos_table1 on table1 t1 (cost=0.43..26881.79 rows=1481535 width=8) (actual time=0.011..179.682 rows=1477585 loops=1)
        Heap Fetches: 46760
  ->  Function Scan on does_t2_record_exist t2_record_exist (cost=0.25..0.26 rows=1 width=1) (actual time=0.004..0.004 rows=1 loops=1477585)
Planning Time: 0.065 ms
Execution Time: 6024.267 ms

Solution

  • You did not show actual query plans (output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS)), but there are several performance traps in your function. Rewrite as:

    CREATE OR REPLACE FUNCTION does_t2_record_exist(_object_id int8, _table_name text)
      RETURNS bool
      LANGUAGE sql STABLE PARALLEL SAFE AS  -- !
    $func$
    SELECT EXISTS (
       SELECT FROM table2 t2
       WHERE  t2.fk = _object_id
       AND    lower(t2.table_name) = _table_name
       );
    $func$;
    

    And your query then is simply:

    SELECT ...
         , does_t2_record_exist(t1.id, 't1') AS t2_record_exists
    FROM   table1 t1
    

    No need for the LATERAL join, once you have wrapped the correlated subquery into the function.

    The expression lower(t2.table_name) is not sargable. Be sure to have a (default) B-tree multicolumn, expression index on table2(fk, lower(tablename)). (Or just on one of both index fields if that is selective enough on its own.)

    You don't need PL/pgSQL for the simple function. Switching to LANGUAGE sql possibly allows function inlining.

    Add the function labels PARALLEL SAFE and STABLE (because they both actually apply!) to allow various performance optimizations.

    Still, if anything, the function can only be slower than putting the EXISTS expression with a correlated subquery into the SELECT list directly. Very little, though, if you do everything right.

    That said, a LATERAL join, or your function running a correlated subquery has to be executed once per qualifying row in the query. If that results in querying the majority of rows that a plain subquery would have to process, a join (not LATERAL) is still the fastest option. (Or your original query, which may result in the same query plan.) Like:

    SELECT ...
         , t2.fk IS NOT NULL AS t2_record_exists
    FROM   table1 t1
    LEFT   JOIN (
       SELECT DISTINCT t2.fk  -- do we even need DISTINCT?
       FROM   table2 t2
       WHERE  lower(t2.table_name) = 't1'
       ) t2 ON t2.fk = t1.id
    

    See:

    Added query plans reveal ...

    Your table table2 only has 372 rows, 113 of which are filtered. That's tiny. You won't need an index! Postgres gets it done with a sequential scan in one fell swoop for the original query.
    Wrapping the functionality into the black box of a PL/pgSQL function forced Postgres to execute it loops=1477585 times. That's bad.

    My rewritten version may fix this. Or revert to your original query. (Or my appended alternate.)