Say I have a table with 20 million rows I want to index like so:
CREATE INDEX fruit_color
ON fruits
USING btree
(color);
Now let's say that only 2% of the fruits have a color, rest will be NULL
. My queries will NEVER want to find fruits with color NULL
(no color), so the question is, will it make a difference for postgresql if I change the index to:
CREATE INDEX fruit_color
ON fruits
USING btree
(color)
WHERE color IS NOT NULL;
I don't know much about postgresql's internal way of handling indexes, so this is why I ask.
PS postgresql version is 9.2
Yes, that will make a difference. How much of a difference depends on how the index is used.
If there is only one fruit with a certain color, and you search for this fruit by color, it won't make much of a difference; maybe one less page will be accessed (because the index has maybe one level of depth less).
If there are many fruits of a certain color, the improvement will be great, because it will be much cheaper to scan the whole index (for a bitmap index scan) or a bigger part of it (for a regular or index-only scan). If the index is big, PostgreSQL will be more reluctant to scan the complete index and will probably choose a sequential table scan instead.