I have a table with 1m records, with 100k records having null on colA
. Remaining records have pretty distinct values, is there a difference in creating a regular index on this column vs a partial index with where colA is not null
?
Since regular Postgres indexes do not store NULL values, wouldn't it be the same as creating a partial index with where colA is not null
?
Any pros or cons with either indexes?
It all depends.
NULL
values are included in (default) B-tree indexes since version Postgres 8.3, like Schwern provided. However, predicates like the one you mention (where colA is not null
) are only properly supported since Postgres 9.0. The release notes:
Allow
IS NOT NULL
restrictions to use indexes (Tom Lane)This is particularly useful for finding
MAX()
/MIN()
values in indexes that contain many null values.
GIN indexes followed later:
As of PostgreSQL 9.1, null key values can be included in the index.
Typically, a partial index makes sense if it excludes a major part of the table from the index, making it substantially smaller and saving writes to the index. Since B-tree indexes are so shallow, bare seek performance scales fantastically (once the index is cached). 10 % fewer index entries hardly matter in that area.
Your case would exclude only around 10% of all rows, and that rarely pays. A partial index adds some overhead for the query planner and excludes queries that don't match the index condition. (The Postgres query planner doesn't try hard if the match is not immediately obvious.)
OTOH, Postgres will rarely use an index for predicates retrieving 10 % of the table - a sequential scan will typically be faster. Again, it depends.
If (almost) all queries exclude NULL
anyway (in a way the Postgres planner understands), then a partial index excluding only 10 % of all rows is still a sensible option. But it may backfire if query patterns change. The added complexity may not be worth it.
Also worth noting that there are still corner cases with NULL
values in Postgres indexes. I bumped into this case recently where Postgres proved unwilling to read sorted rows from a multicolumn index when the first index expression was filtered with IS NULL
(making a partial index preferable for the case):
db<>fiddle here
So, it depends on the complete picture.