sql postgresql performance query-optimization postgresql-15

Optimizing simple but slow query with OR condition

Some background: I'm running the following simple select statement on Postgres 15.3 server with 128GB memory, and contrary to my belief it takes ~6 minutes. The statement in involve two relations, big_table_70m with approximately 70M rows and other_table_50m with a bit less than 50M rows. In both tables I've Btree indexes on esat_last_modified and last_modified.

This takes 6 minutes to return:

select
    count(*)
FROM
    big_table_70m
where
    big_table_70m.esat_last_modified > (
        select
            max(esat_last_modified)
        from
            other_table_50m
    )
    OR big_table_70m.task_last_modified > (
        select
            max(last_modified)
        from
            other_table_50m
    );

However when dropping the OR condition and performing the query on either side, it returns very fast (>70ms):

select
    count(*)
FROM
    big_table_70m
where
    big_table_70m.esat_last_modified > (
        select
            max(esat_last_modified)
        from
            other_table_50m
    );

I've tried to add multi-column index on (esat_last_modified,last_modified) to make it more efficient without a success.

Can you please help me improve this query. Eventually the return count should be few thousands of rows.

explain analyze:

                                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1765894.63..1765894.64 rows=1 width=8) (actual time=403477.117..403477.314 rows=1 loops=1)
   Buffers: shared hit=77762962 read=848381 dirtied=27535
   I/O Timings: shared/local read=380517.629
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.70..0.71 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1)
           Buffers: shared hit=6
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.56..0.70 rows=1 width=8) (actual time=0.026..0.027 rows=1 loops=1)
                   Buffers: shared hit=6
                   ->  Index Only Scan Backward using f9b1917d68a5656ca14b2f5e12447c6a on other_table_50m  (cost=0.56..4671638.13 rows=33810205 width=8) (actual time=0.025..0.026 rows=1 loops=1)
                         Index Cond: (esat_last_modified IS NOT NULL)
                         Heap Fetches: 1
                         Buffers: shared hit=6
   InitPlan 4 (returns $3)
     ->  Result  (cost=0.67..0.68 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
           Buffers: shared hit=5
           InitPlan 3 (returns $2)
             ->  Limit  (cost=0.56..0.67 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=1)
                   Buffers: shared hit=5
                   ->  Index Only Scan Backward using "429b05d0a626f21b4708244ab67da408" on other_table_50m other_table_50m_1  (cost=0.56..5092889.03 rows=47176591 width=8) (actual time=0.013..0.013 rows=1 loops=1)
                         Index Cond: (last_modified IS NOT NULL)
                         Heap Fetches: 1
                         Buffers: shared hit=5
   ->  Gather  (cost=1765893.02..1765893.23 rows=2 width=8) (actual time=403477.112..403477.306 rows=1 loops=1)
         Workers Planned: 2
         Params Evaluated: $1, $3
         Workers Launched: 0
         Buffers: shared hit=77762962 read=848381 dirtied=27535
         I/O Timings: shared/local read=380517.629
         ->  Partial Aggregate  (cost=1764893.02..1764893.03 rows=1 width=8) (actual time=403476.429..403476.429 rows=1 loops=1)
               Buffers: shared hit=77762951 read=848381 dirtied=27535
               I/O Timings: shared/local read=380517.629
               ->  Parallel Index Only Scan using multi_column_index on big_table_70m  (cost=0.57..1724191.80 rows=16280489 width=0) (actual time=403476.425..403476.425 rows=0 loops=1)
                     Filter: ((esat_last_modified > $1) OR (task_last_modified > $3))
                     Rows Removed by Filter: 70394276
                     Heap Fetches: 6894859
                     Buffers: shared hit=77762951 read=848381 dirtied=27535
                     I/O Timings: shared/local read=380517.629
 Planning:
   Buffers: shared hit=352
 Planning Time: 0.949 ms
 Execution Time: 403477.593 ms
(42 rows)

Here is a link to depesz It shows that significant amount of time is due to IO (reading from the disk).

Added multi-column index on big_table_70m without a success. I've also rewrite the select to use CTE but obviously it didn't make any change to the plan.

I expect the statement to return in the same speed as without the OR condition.

EDIT: Adding explain plan for the query without OR (added one but it's the same for both):

                                                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1322659.98..1322659.99 rows=1 width=8) (actual time=7.493..8.952 rows=1 loops=1)
   Buffers: shared read=10
   I/O Timings: shared/local read=6.931
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.70..0.71 rows=1 width=8) (actual time=3.905..3.906 rows=1 loops=1)
           Buffers: shared read=6
           I/O Timings: shared/local read=3.872
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.56..0.70 rows=1 width=8) (actual time=3.902..3.903 rows=1 loops=1)
                   Buffers: shared read=6
                   I/O Timings: shared/local read=3.872
                   ->  Index Only Scan Backward using f9b1917d68a5656ca14b2f5e12447c6a on other_table_50m  (cost=0.56..4686718.49 rows=33918614 width=8) (actual time=3.901..3.901 rows=1 loops=1)
                         Index Cond: (esat_last_modified IS NOT NULL)
                         Heap Fetches: 1
                         Buffers: shared read=6
                         I/O Timings: shared/local read=3.872
   ->  Gather  (cost=1322659.05..1322659.26 rows=2 width=8) (actual time=7.447..8.946 rows=3 loops=1)
         Workers Planned: 2
         Params Evaluated: $1
         Workers Launched: 2
         Buffers: shared read=10
         I/O Timings: shared/local read=6.931
         ->  Partial Aggregate  (cost=1321659.05..1321659.06 rows=1 width=8) (actual time=1.289..1.289 rows=1 loops=3)
               Buffers: shared read=4
               I/O Timings: shared/local read=3.059
               ->  Parallel Index Only Scan using big_table_70m_esat_last_modified on big_table_70m  (cost=0.57..1297144.84 rows=9805685 width=0) (actual time=1.287..1.287 rows=0 loops=3)
                     Index Cond: (esat_last_modified > $1)
                     Heap Fetches: 0
                     Buffers: shared read=4
                     I/O Timings: shared/local read=3.059
 Planning:
   Buffers: shared hit=328 read=9
   I/O Timings: shared/local read=8.097
 Planning Time: 9.082 ms
 Execution Time: 9.073 ms
(35 rows)

Solution

OR is often very difficult for an optimizer to deal with, as it cannot use an index on either column as that doesn't take into account the other.

Sometimes it can do an Index Union, but it often doesn't work that out. So you need to give it some help.

Use a UNION on the primary keys of the table, which should in theory allow it to use an efficient Merge Union, then aggregate again.

Note that you must use UNION not UNION ALL otherwise you could count a row twice.

select
    count(*)
from (
    select bg.PrimaryKeyHere
    from big_table_70m bg
    where
        bg.esat_last_modified > (
            select
                max(o.esat_last_modified)
            from
                other_table_50m o
        )

    union

    select bg.PrimaryKeyHere
    from big_table_70m bg
    where
        bg.task_last_modified > (
            select
                max(o.last_modified)
            from
                other_table_50m o
        )
) bg;

For this to work efficiently, you are going to want the following indexes:

big_table_70m (esat_last_modified)
big_table_70m (task_last_modified)
other_table_50m (esat_last_modified)
other_table_50m (last_modified)

You can add other columns into the key or the INCLUDE of the indexes, but those columns should be first, and should not be merged together into a single index.