I am dealing with a PostgreSQL (v14) query of this kind
SELECT
...,
EXISTS (
SELECT 1
FROM table2 t2
WHERE t2.fk = t1.id
AND LOWER(t2.table_name) = 't1'
) AS t2_record_exists
FROM table1 t1;
and was hoping to extract the logic in a function to be used in lateral join, in order to compute that field in a more readable way (as it is to be used in a view).
This is the resulting query
SELECT
...,
t2_record_exists.t2_record_exists
FROM table1 t1
LEFT JOIN LATERAL does_t2_record_exist(t1.id, 't1') t2_record_exists(t2_record_exists) ON TRUE;
That makes use of this function
CREATE OR REPLACE FUNCTION does_t2_record_exist(object_id int8, _table_name text)
RETURNS bool
LANGUAGE plpgsql
AS $function$
BEGIN
RETURN EXISTS (
SELECT 1
FROM table2 t2
WHERE t2.fk = object_id
AND LOWER(t2.table_name) = _table_name
);
END $function$;
The second query suffers a severe performance loss, as it executes in about 6000ms, while the first gets it done in 300ms.
I don't know why this would happen, as I naively assumed that the very same operation (the EXISTS subquery) would be executed the same amount of times (once per row).
What is going wrong here? How can one foresee such performance issues beforehand?
EDIT: Here are the query plans (obtained with EXPLAIN ANALYZE
) for my specific case
Query 1 (with subquery):
Index Only Scan using cos_table1 on table1 t1 (cost=0.43..3723311.61 rows=1481535 width=9) (actual time=56.299..247.674 rows=1477585 loops=1)
Heap Fetches: 46760
SubPlan 2
-> Seq Scan on table2 t2 (cost=0.00..15.27 rows=2 width=8) (actual time=11.300..11.454 rows=372 loops=1)
Filter: (lower(table_name) = 't1'::text)
Rows Removed by Filter: 113
Planning Time: 0.085 ms
JIT:
Functions: 13
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.769 ms, Inlining 17.544 ms, Optimization 24.557 ms, Emission 13.652 ms, Total 56.524 ms
Execution Time: 276.285 ms
Query 2 (with LATERAL JOIN):
Nested Loop Left Join (cost=0.68..56512.74 rows=1481535 width=9) (actual time=0.039..5978.865 rows=1477585 loops=1)
-> Index Only Scan using cos_table1 on table1 t1 (cost=0.43..26881.79 rows=1481535 width=8) (actual time=0.011..179.682 rows=1477585 loops=1)
Heap Fetches: 46760
-> Function Scan on does_t2_record_exist t2_record_exist (cost=0.25..0.26 rows=1 width=1) (actual time=0.004..0.004 rows=1 loops=1477585)
Planning Time: 0.065 ms
Execution Time: 6024.267 ms
You did not show actual query plans (output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
), but there are several performance traps in your function. Rewrite as:
CREATE OR REPLACE FUNCTION does_t2_record_exist(_object_id int8, _table_name text)
RETURNS bool
LANGUAGE sql STABLE PARALLEL SAFE AS -- !
$func$
SELECT EXISTS (
SELECT FROM table2 t2
WHERE t2.fk = _object_id
AND lower(t2.table_name) = _table_name
);
$func$;
And your query then is simply:
SELECT ...
, does_t2_record_exist(t1.id, 't1') AS t2_record_exists
FROM table1 t1
No need for the LATERAL
join, once you have wrapped the correlated subquery into the function.
The expression lower(t2.table_name)
is not sargable. Be sure to have a (default) B-tree multicolumn, expression index on table2(fk, lower(tablename))
. (Or just on one of both index fields if that is selective enough on its own.)
You don't need PL/pgSQL for the simple function. Switching to LANGUAGE sql
possibly allows function inlining.
Add the function labels PARALLEL SAFE
and STABLE
(because they both actually apply!) to allow various performance optimizations.
Still, if anything, the function can only be slower than putting the EXISTS
expression with a correlated subquery into the SELECT
list directly. Very little, though, if you do everything right.
That said, a LATERAL
join, or your function running a correlated subquery has to be executed once per qualifying row in the query. If that results in querying the majority of rows that a plain subquery would have to process, a join (not LATERAL
) is still the fastest option. (Or your original query, which may result in the same query plan.) Like:
SELECT ...
, t2.fk IS NOT NULL AS t2_record_exists
FROM table1 t1
LEFT JOIN (
SELECT DISTINCT t2.fk -- do we even need DISTINCT?
FROM table2 t2
WHERE lower(t2.table_name) = 't1'
) t2 ON t2.fk = t1.id
See:
Your table table2
only has 372 rows, 113 of which are filtered. That's tiny. You won't need an index! Postgres gets it done with a sequential scan in one fell swoop for the original query.
Wrapping the functionality into the black box of a PL/pgSQL function forced Postgres to execute it loops=1477585
times. That's bad.
My rewritten version may fix this. Or revert to your original query. (Or my appended alternate.)