postgresql

How can I ensure efficient partition pruning when using dates from subqueries?


I use a table with a section based on date. The primary key index is built on the id and open_date.

If I use the date from the subquery, the planner will search through all partitions.

> EXPLAIN analyze SELECT * 
FROM calendar wc, tab t 
WHERE t.open_date =wc.wrk_day  
 AND wc.on_date ='2024-10-30'
 
QUERY PLAN                                                                                                                                       |
-------------------------------------------------------------------------------------------------------------------------------------------------+
Hash JOIN  (cost=2.38..9602.67 ROWS=1099 width=85) (actual TIME=85.080..89.473 ROWS=41 loops=1)                                                  |
  Hash Cond: (t.open_date = wc.wrk_day)                                                                                                |
  ->  Append  (cost=0.00..8758.21 ROWS=316133 width=69) (actual TIME=0.590..54.404 ROWS=315957 loops=1)                                          |
        ->  Seq Scan ON t_2024_02_01 t_1  (cost=0.00..0.00 ROWS=1 width=596) (actual TIME=0.060..0.061 ROWS=0 loops=1)           |
        ->  Seq Scan ON t_2024_02_02 t_2  (cost=0.00..0.00 ROWS=1 width=596) (actual TIME=0.012..0.012 ROWS=0 loops=1)           |
        ->  Seq Scan ON t_2024_02_03 t_3  (cost=0.00..0.00 ROWS=1 width=596) (actual TIME=0.008..0.008 ROWS=0 loops=1)           |
        ->  Seq Scan ON t_2024_02_04 t_4  (cost=0.00..0.00 ROWS=1 width=596) (actual TIME=0.007..0.007 ROWS=0 loops=1)           |
...
...
->  Hash  (cost=2.37..2.37 ROWS=1 width=16) (actual TIME=0.021..0.021 ROWS=1 loops=1)                                                         |
        Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                            |
        ->  INDEX Scan USING calendar_pkey ON calendar wc  (cost=0.15..2.37 ROWS=1 width=16) (actual TIME=0.008..0.009 ROWS=1 loops=1)|
              INDEX Cond: (on_date = '2024-10-30'::DATE)

IIf I input the date "October 30, 2024", the planner will filter out relevant partitions based on that date.

EXPLAIN analyze 
SELECT * 
FROM calendar wc, tab t 
WHERE t.open_date=wc.wrk_day  
 AND wc.on_date ='2024-10-30' AND t.open_date>='2024-10-29' AND t.open_date<'2024-11-01'
 
QUERY PLAN                                                                                                                                      |
------------------------------------------------------------------------------------------------------------------------------------------------+
Hash JOIN  (cost=2.38..33.53 ROWS=3 width=85) (actual TIME=0.037..0.320 ROWS=41 loops=1)                                                        |
  Hash Cond: (t.open_date = wc.wrk_day)                                                                                               |
  ->  Append  (cost=0.00..28.90 ROWS=845 width=69) (actual TIME=0.012..0.225 ROWS=845 loops=1)                                                  |
        ->  Seq Scan ON t_2024_10_29 t_1  (cost=0.00..1.61 ROWS=41 width=69) (actual TIME=0.011..0.016 ROWS=41 loops=1)         |
              FILTER: ((open_date >= '2024-10-29'::DATE) AND (open_date < '2024-11-01'::DATE))                                            |
        ->  Seq Scan ON t_2024_10_30 t_2  (cost=0.00..1.87 ROWS=58 width=69) (actual TIME=0.005..0.011 ROWS=58 loops=1)         |
              FILTER: ((open_date >= '2024-10-29'::DATE) AND (open_date < '2024-11-01'::DATE))                                            |
        ->  Seq Scan ON t_2024_10_31 t_3  (cost=0.00..21.19 ROWS=746 width=69) (actual TIME=0.006..0.135 ROWS=746 loops=1)      |
              FILTER: ((open_date >= '2024-10-29'::DATE) AND (open_date < '2024-11-01'::DATE))                                            |
  ->  Hash  (cost=2.37..2.37 ROWS=1 width=16) (actual TIME=0.015..0.016 ROWS=1 loops=1)                                                         |
        Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                            |
        ->  INDEX Scan USING calendar_pkey ON calendar wc  (cost=0.15..2.37 ROWS=1 width=16) (actual TIME=0.012..0.013 ROWS=1 loops=1)|
              INDEX Cond: (on_date = '2024-10-30'::DATE)                                                                                        |
Planning TIME: 0.408 ms                                                                                                                         |
Execution TIME: 0.356 ms                                                                                                                        |
 
15 ROW(s) fetched.

I am looking for a way to use the date from a subquery and apply a filter.

DDL tab

CREATE TABLE usr.tab (
    id int8 NOT NULL,
    open_date date NOT NULL,
    system_date timestamp NULL,
    amount numeric(19, 2) NULL
    CONSTRAINT tab_pkey PRIMARY KEY (id, open_date)
)
PARTITION BY RANGE (open_date);
CREATE INDEX tab_on_banking_date ON ONLY usr.tab USING btree (open_date);
CREATE TABLE usr.tab_2024_02 PARTITION OF usr.tab FOR VALUES FROM ('2024-02-01') TO ('2024-03-01')
PARTITION BY RANGE (open_date);
CREATE TABLE tab_2024_02_01 PARTITION OF tab_2024_02  FOR VALUES FROM ('2024-02-01') TO ('2024-02-02');

query with filter, it doesn't help

explain analyze 
    select
    wc.prev_work_day d_f_w 
    from work_calendar wc 
    left join account_document ad on ad.banking_date = wc.prev_work_day
    where wc.on_date = '2024-10-30' 

QUERY PLAN                                                                                                                                                                                                |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Hash Right Join  (cost=2.38..7957.26 rows=1099 width=4) (actual time=71.377..75.512 rows=41 loops=1)                                                                                                      |
  Hash Cond: (ad.banking_date = wc.prev_work_day)                                                                                                                                                         |
  ->  Append  (cost=0.00..7112.81 rows=316133 width=4) (actual time=0.540..54.097 rows=315957 loops=1)                                                                                                    |
        ->  Seq Scan on account_document_2024_02_01 ad_1  (cost=0.00..0.00 rows=1 width=4) (actual time=0.046..0.046 rows=0 loops=1)                                                                      |
        ->  Seq Scan on account_document_2024_02_02 ad_2  (cost=0.00..0.00 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)                                                                      |
        ->  Seq Scan on account_document_2024_02_03 ad_3  (cost=0.00..0.00 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=1)                                                                      |
        
etc...


Solution

  • I found another solution, to use an immutable function.

    CREATE OR REPLACE FUNCTION 
    work_day(DATE) RETURNS DATE AS $$
    SELECT wc.prev_work_day FROM calendar wc WHERE wc.on_date= ($1)::DATE;
    $$ LANGUAGE SQL IMMUTABLE;
    
    EXPLAIN analyze
    SELECT  *
    FROM calendar wc 
    LEFT JOIN tab t ON t.open_date=work_day('2024-10-30')
    WHERE wc.on_date = '2024-10-30' 
    
    QUERY PLAN           
                                                                                                                               
    LIMIT  (cost=0.15..4.29 ROWS=41 width=85) (actual TIME=0.018..0.041 ROWS=41 loops=1)                                                             
    ->  Nested Loop LEFT JOIN  (cost=0.15..4.29 ROWS=41 width=85) (actual TIME=0.016..0.035 ROWS=41 loops=1)                                      |
        ->  INDEX Scan USING calendar_pkey ON calendar wc  (cost=0.15..2.37 ROWS=1 width=16) (actual TIME=0.005..0.006 ROWS=1 loops=1)|
              INDEX Cond: (on_date = '2024-10-30'::DATE)                                                                                        |
        ->  Seq Scan ON tab_2024_10_29 t  (cost=0.00..1.51 ROWS=41 width=69) (actual TIME=0.009..0.017 ROWS=41 loops=1)           |
              FILTER: (open_date = '2024-10-29'::DATE)                                                                                       |
     Planning TIME: 0.325 ms                                                                                                                         |
     Execution TIME: 0.070 ms                                                                                                                        |
    
     8 ROW(s) fetched.
    

    Cost from

    Hash Right Join  (cost=2.38..9602.70 rows=1099 width=85) (actual time=82.200..86.531 rows=41 loops=1)
    

    To

    Nested Loop Left Join  (cost=0.15..4.29 rows=41 width=85) (actual time=0.017..0.036 rows=41 loops=1)