databasepostgresqlquery-optimizationrelational-databasedatabase-indexes

Why is PostgreSQL sorting a seemingly already sorted result set?


I'm trying to optimize the following query for a school assignment:

SELECT
    DATE(b.book_date),
    SUM(b.total_amount) revenue,
    COUNT(DISTINCT(t.passenger_id)) count_passengers
FROM bookings b
JOIN tickets t ON t.book_ref = b.book_ref
GROUP BY
    DATE(b.book_date)
ORDER BY
    COUNT(DISTINCT(t.passenger_id)) DESC,
    SUM(b.total_amount) DESC;

Without creating any indexes, the scheduler gives me the following explain plan with EXPLAIN ANALYZE

|QUERY PLAN                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|Sort  (cost=20001452121.83..20001453243.21 rows=448552 width=44) (actual time=10862.729..10862.752 rows=392 loops=1)                                                |
|  Sort Key: (count(DISTINCT t.passenger_id)) DESC, (sum(b.total_amount)) DESC                                                                                       |
|  Sort Method: quicksort  Memory: 55kB                                                                                                                              |
|  ->  GroupAggregate  (cost=20001359986.85..20001396213.70 rows=448552 width=44) (actual time=8131.457..10862.442 rows=392 loops=1)                                 |
|        Group Key: (date(b.book_date))                                                                                                                              |
|        ->  Sort  (cost=20001359986.85..20001367361.50 rows=2949857 width=22) (actual time=8131.394..8329.844 rows=2949857 loops=1)                                 |
|              Sort Key: (date(b.book_date))                                                                                                                         |
|              Sort Method: external merge  Disk: 95136kB                                                                                                            |
|              ->  Merge Join  (cost=20000859819.02..20000921997.07 rows=2949857 width=22) (actual time=6064.572..7486.794 rows=2949857 loops=1)                     |
|                    Merge Cond: (b.book_ref = t.book_ref)                                                                                                           |
|                    ->  Sort  (cost=10000342915.67..10000348193.45 rows=2111110 width=21) (actual time=854.300..1138.458 rows=2111110 loops=1)                      |
|                          Sort Key: b.book_ref                                                                                                                      |
|                          Sort Method: external merge  Disk: 66024kB                                                                                                |
|                          ->  Seq Scan on bookings b  (cost=10000000000.00..10000034558.10 rows=2111110 width=21) (actual time=90.339..215.696 rows=2111110 loops=1)|
|                    ->  Sort  (cost=10000516903.35..10000524278.00 rows=2949857 width=19) (actual time=5210.197..5407.427 rows=2949857 loops=1)                     |
|                          Sort Key: t.book_ref                                                                                                                      |
|                          Sort Method: external sort  Disk: 95320kB                                                                                                 |
|                          ->  Seq Scan on tickets t  (cost=10000000000.00..10000078913.57 rows=2949857 width=19) (actual time=0.134..241.174 rows=2949857 loops=1)  |
|Planning Time: 0.121 ms                                                                                                                                             |
|JIT:                                                                                                                                                                |
|  Functions: 16                                                                                                                                                     |
|  Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                       |
|  Timing: Generation 0.834 ms, Inlining 7.090 ms, Optimization 51.113 ms, Emission 32.119 ms, Total 91.156 ms                                                       |
|Execution Time: 10890.239 ms                                                                                                                                        |

For optimizing the join operations, I'm creating the following indexes:

CREATE INDEX idx_bookings_bd_ta_bref ON bookings USING btree(book_date, total_amount, book_ref);
CREATE INDEX idx_tickets_bref ON tickets USING hash(book_ref);

The main idea is that the scheduler can perform the Join operation by iterating over the index over bookings, and acquiring the necessary rows from tickets for each bookings row through the has index, making the result set be already sorted by book_date after the join is finished and therefore eliminating the sort operation over the result set. After disabling seqscan and some indexes, I can finally get the scheduler to do what I want, which leads to the following EXPLAIN ANALYZE output

|QUERY PLAN                                                                                                                                                                        |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|Sort  (cost=966377.54..967498.92 rows=448552 width=44) (actual time=6961.514..6961.535 rows=392 loops=1)                                                                          |
|  Sort Key: (count(DISTINCT t.passenger_id)) DESC, (sum(b.total_amount)) DESC                                                                                                     |
|  Sort Method: quicksort  Memory: 55kB                                                                                                                                            |
|  ->  GroupAggregate  (cost=874242.56..910469.41 rows=448552 width=44) (actual time=4283.854..6961.216 rows=392 loops=1)                                                          |
|        Group Key: (date(b.book_date))                                                                                                                                            |
|        ->  Sort  (cost=874242.56..881617.20 rows=2949857 width=22) (actual time=4283.792..4475.416 rows=2949857 loops=1)                                                         |
|              Sort Key: (date(b.book_date))                                                                                                                                       |
|              Sort Method: external merge  Disk: 95136kB                                                                                                                          |
|              ->  Nested Loop  (cost=0.43..436252.77 rows=2949857 width=22) (actual time=64.342..3727.674 rows=2949857 loops=1)                                                   |
|                    ->  Index Only Scan using idx_bookings_bd_ta_bref on bookings b  (cost=0.43..73467.08 rows=2111110 width=21) (actual time=0.049..172.735 rows=2111110 loops=1)|
|                          Heap Fetches: 0                                                                                                                                         |
|                    ->  Index Scan using idx_tickets_bref on tickets t  (cost=0.00..0.15 rows=2 width=19) (actual time=0.001..0.001 rows=1 loops=2111110)                         |
|                          Index Cond: (book_ref = b.book_ref)                                                                                                                     |
|                          Rows Removed by Index Recheck: 0                                                                                                                        |
|Planning Time: 0.139 ms                                                                                                                                                           |
|JIT:                                                                                                                                                                              |
|  Functions: 10                                                                                                                                                                   |
|  Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                                     |
|  Timing: Generation 0.448 ms, Inlining 6.415 ms, Optimization 34.103 ms, Emission 23.769 ms, Total 64.734 ms                                                                     |
|Execution Time: 6971.385 ms                                                                                                                                                       |

The part that confuses me is that, despite the result set already being sorted by book_date, the scheduler insists on performing an external disk sort operation anyways, which is considerably slowing things down. Why is PostgreSQL sorting something that should already be sorted? How can I prevent it from doing so? Note that I don't want the answer to the problem, I just want to know why the scheduler is doing something it really shouldn't need to do.

In case it's relevant, I'm working with PostgreSQL version 14 and the book_date column that's causing issues is a timestampz column.


Solution

  • The sort is required because the query groups by DATE(b.book_date), but that expression is not the leading key of an index on bookings. The query planner determined that an index scan would be more efficient that a table scan, but it isn't able to use the key b.book_date as a surrogate for ordering by DATE(b.book_date). Changing the type of book_date from TIMESTAMPTZ to DATE and then grouping by b.book_date would likely eliminate the additional sort. Changing the index to a functional index by replacing book_date with DATE(book_date) as the leading key is also likely to achieve the desired result.