I was comparing the performance of a normal index and partial index in Postgres and to my surprise, the partial index performance seems sub-optimal compared to the normal index.
The explain statement after the index creation:
CREATE INDEX sett_ked_ped_idx ON public.settlement USING btree (client_id, is_deleted, knowledge_end_date DESC, period_end_date)
SELECT "settlement"."payee_email_id"
FROM "settlement"
WHERE ("settlement"."client_id" = 3
AND NOT "settlement"."is_deleted"
AND "settlement"."knowledge_end_date" IS NULL
AND "settlement"."period_end_date" = '2023-06-30T23:59:59.999999+00:00'::timestamptz);
INDEX Scan USING sett_ked_ped_idx ON settlement (cost=0.56..7169.58 ROWS=3014 width=28) (actual TIME=1.485..2.284 ROWS=42 loops=1)
The query became performant but want to try a partial index since the fields is_deleted
and knowledge_end_date
will be of the same value and as a partial index can reduce the overall index size. To my surprise, the query became less performant with time and cost.
Explain statement after partial index creation:
CREATE INDEX sett_ked_ped_idx ON public.settlement USING btree (client_id, is_deleted, knowledge_end_date DESC, period_end_date) where not is_deleted and knowledge_end_date is null
INDEX Scan USING sett_ked_ped_idx ON settlement (cost=0.56..331556.57 ROWS=736 width=28) (actual TIME=602.011..1078.219 ROWS=42 loops=1)
I am not sure why this is the case. Shouldn't the query using the partial index be faster?
To make this a fair fight, drop the now redundant columns is_deleted
and knowledge_end_date
from the partial index:
CREATE INDEX sett_ked_ped_idx ON public.settlement (client_id, period_end_date)
WHERE NOT is_deleted AND knowledge_end_date IS NULL;
Then test again.
To note, a partial index typically only makes sense once you exclude a substantial share of all rows. If WHERE NOT is_deleted AND knowledge_end_date IS NULL
only excludes a few percent or less, don't go there.