postgresqlsql-execution-plan

Discrepancy between cost and row count estimates in nested loop


I have this snippet of a PostgreSQL explain plan I'm trying to comprehend:

 ->  Nested Loop  (cost=0.57..933455.16 rows=11 width=122) (actual time=3.710..497.990 rows=86102 loops=1)
       ->  CTE Scan on last_outputs  (cost=0.00..5.02 rows=251 width=172) (actual time=3.675..333.800 rows=1773 loops=1)
       ->  Index Scan using i_batch_company_outputs_file_md5 on batch_company_outputs bco2  (cost=0.57..3718.91 rows=1 width=65) (actual time=0.019..0.080 rows=49 loops=1773)
             Index Cond: ((file_md5)::text = (last_outputs.file_md5)::text)
             Filter: (((service_id)::text = 'sheetbuilder'::text) AND ((file_name)::text = 'output.txt'::text) AND ((last_outputs.company_id)::text = ((company_id)::text || ''::text)))
             Rows Removed by Filter: 9

Based on my understanding:

Despite this, there's still a relatively high cost estimate (933455.16) which doesn't seem to be explained by any of the operations within the loop. Why? Is there something else I can try to get a better view of where that number comes from?


Solution

  • I would not call 3718.91 a low cost estimate for one iteration of an index scan. I have no idea what is causing the high cost estimate, I would assume that file_md5 is not the first column in the index i_batch_company_outputs_file_md5 (despite the name suggesting it would be.). But since you said it is a single-column index, my next guess is that the planner thinks the index will match a lot of rows (like nearly 1000) on each iteration but then all but one of them will be eliminated by the filter. But to eliminate them with the filter, it first needs to fetch each row from the table, expecting to incur a random page cost per row. You can see that only 9 rows are removed by the filter (on average per iteration), but the plan doesn't give us any visibility into how many the planner thought would be removed by the filter. You could write some test queries to try to get the planner to divulge its inner thoughts here. But since two thirds of the time is taken up by the CTE, not by this part of the plan, this is probably not worth investigating further, other than just for curiosity sake.

    3718.91 * 251 = 933446.41, which does neatly explain the total cost estimate. There are discrepancies in the (partial) plan, but the questions you are asking are not relevant to what the actual discrepancies are.