postgresqlindexingpartial-index

Will it make a noticeable different in performance/index size if my index is made partial?


Say I have a table with 20 million rows I want to index like so:

CREATE INDEX fruit_color
  ON fruits
  USING btree
  (color);

Now let's say that only 2% of the fruits have a color, rest will be NULL. My queries will NEVER want to find fruits with color NULL (no color), so the question is, will it make a difference for postgresql if I change the index to:

 CREATE INDEX fruit_color
  ON fruits
  USING btree
  (color)
  WHERE color IS NOT NULL;

I don't know much about postgresql's internal way of handling indexes, so this is why I ask.

PS postgresql version is 9.2


Solution

  • Yes, that will make a difference. How much of a difference depends on how the index is used.

    If there is only one fruit with a certain color, and you search for this fruit by color, it won't make much of a difference; maybe one less page will be accessed (because the index has maybe one level of depth less).

    If there are many fruits of a certain color, the improvement will be great, because it will be much cheaper to scan the whole index (for a bitmap index scan) or a bigger part of it (for a regular or index-only scan). If the index is big, PostgreSQL will be more reluctant to scan the complete index and will probably choose a sequential table scan instead.