I have this very simple query, generated by my ORM (Entity Framework Core):
SELECT *
FROM "table1" AS "t1"
WHERE EXISTS (
SELECT 1
FROM "table2" AS "t2"
WHERE ("t2"."is_active" = TRUE) AND ("t1"."table2_id" = "t2"."id"))
ORDER BY "t1"."table2_id"
SET STATISTICS 10000
was set to all of the 3 columns.VACUUM FULL ANALYZE
was run on both tables.WITHOUT the ORDER BY
clause, the query returns within a few milliseconds, and I’d expect nothing else for 4 records to return. EXPLAIN output:
Nested Loop (cost=1.13..13.42 rows=103961024 width=121)
-> Index Scan using table2_is_active_idx on table2 (cost=0.56..4.58 rows=1 width=8)
Index Cond: (is_active = true)
Filter: is_active
-> Index Scan using table1_table2_id_fkey on table1 t1 (cost=0.57..8.74 rows=10 width=121)
Index Cond: (table2_id = table1.id)
WITH the ORDER BY
clause, the query takes 5 minutes to complete! EXPLAIN output:
Merge Semi Join (cost=10.95..4822984.67 rows=103961040 width=121)
Merge Cond: (t1.table2_id = t2.id)
-> Index Scan using table1_table2_id_fkey on table1 t1 (cost=0.57..4563070.61 rows=103961040 width=121)
-> Sort (cost=4.59..4.59 rows=2 width=8)
Sort Key: t2.id
-> Index Scan using table2_is_active_idx on table2 a (cost=0.56..4.58 rows=2 width=8)
Index Cond: (is_active = true)
Filter: is_active
The inner, first index scan should return no more than 2 rows. Then the outer, second index scan doesn't make any sense with its cost of 4563070 and 103961040 rows. It only has to match 2 rows in table2
with 4 rows in table1
!
This is a very simple query with very few records to return. Why is Postgres failing to execute it properly?
Ok I solved my problem in the most unexpected way. I upgraded Postgresql from 9.6.1 to 9.6.3. And that was it. After restarting the service, the explain plan now looked good and the query ran just fine this time. I did not change anything, no new index, nothing. The only explanation I can think of is that there is was a query planner bug in 9.6.1 and solved in 9.6.3. Thank you all for your answers!