I'm struggling to make sense of postgres EXPLAIN to figure out why my query is slow. Can someone help? This is my query, it's a pretty simple join:
SELECT DISTINCT graph_concepts.*
FROM graph_concepts
INNER JOIN graph_family_links
ON graph_concepts.id = graph_family_links.descendent_concept_id
WHERE graph_family_links.ancestor_concept_id = 1016
AND graph_family_links.generation = 1
AND graph_concepts.state != 2
It's starting from a concept and it's getting a bunch of related concepts through the links table.
Notably, I have an index on graph_family_links.descendent_concept_id, yet this query takes about 3 seconds to return a result. This is way too long for my purposes.
This is the SQL explain:
Unique (cost=46347.01..46846.16 rows=4485 width=108) (actual time=27.406..33.667 rows=13 loops=1)
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
-> Gather Merge (cost=46347.01..46825.98 rows=4485 width=108) (actual time=27.404..33.656 rows=13 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
-> Sort (cost=45347.01..45348.32 rows=2638 width=108) (actual time=23.618..23.621 rows=6 loops=2)
Sort Key: graph_concepts.id, graph_concepts.definition, graph_concepts.checkvist_task_id, graph_concepts.primary_question_id, graph_concepts.created_at, graph_concepts.updated_at, graph_concepts.tsn, graph_concepts.state, graph_concepts.search_phrases
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
Worker 0: Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=301.97..45317.02 rows=2638 width=108) (actual time=8.890..23.557 rows=6 loops=2)
Buffers: shared hit=13039 read=5
I/O Timings: read=0.074
-> Parallel Bitmap Heap Scan on graph_family_links (cost=301.88..39380.60 rows=2640 width=4) (actual time=8.766..23.293 rows=6 loops=2)
Recheck Cond: (ancestor_concept_id = 1016)
Filter: (generation = 1)
Rows Removed by Filter: 18850
Heap Blocks: exact=2558
Buffers: shared hit=12985
-> Bitmap Index Scan on index_graph_family_links_on_ancestor_concept_id (cost=0.00..301.66 rows=38382 width=0) (actual time=4.744..4.744 rows=47346 loops=1)
Index Cond: (ancestor_concept_id = 1016)
Buffers: shared hit=67
-> Index Scan using graph_concepts_pkey on graph_concepts (cost=0.08..2.25 rows=1 width=108) (actual time=0.036..0.036 rows=1 loops=13)
Index Cond: (id = graph_family_links.descendent_concept_id)
Filter: (state <> 2)
Buffers: shared hit=54 read=5
I/O Timings: read=0.074
Planning:
Buffers: shared hit=19
Planning Time: 0.306 ms
Execution Time: 33.747 ms
(35 rows)
I'm doing lots of googling to help me figure out how to read this EXPLAIN and I'm struggling. Can someone help translate this into plain english for me?
Answering myself (for the benefit of future people):
My question was primarily how to understand EXPLAIN. Many people below contributed to my understanding but no one really gave me the beginner unpacking I was looking for. I want to teach myself to fish rather than simply having other people read this and give me advice on solving this specific issue, although I do greatly appreciate the specific suggestions!
For others trying to understand EXPLAIN, this is the important context you need to know, which was holding me back:
cost=0.00..
This is the very start of your query execution. In my case, cost=0.00..301.66
is the first step and cost=0.08..2.25
runs in parallel (from step 0.08 to step 2.25, just a small fraction of the 0 to 300).cost=301.88..39380.60
. Although I was confused because I also have a cost=301.97..45317.02
. I think those are, again, both happening in parallel so I'm not sure which one is contributing more.This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.
Since you only do explain your select command
meaning the output is the system planner generate a execute plan for this select query. But if you do explain analyze then it will plan and execute the query.
SELECT DISTINCT
: https://www.postgresql.org/docs/current/sql-select.html
-After sort then since you specified key word DISTINCT then eliminate the duplicates.