[SOLVED] HASH indexes in PostgreSQL

HASH indexes in PostgreSQL

I'm revisiting an old question from 16 years ago about PostgreSQL HASH indexes vs B-TREEs. At the time, it seemed like HASH indexes were far less efficient than B-TREEs in terms of both space and time and the consensus was that HASH indexes offered no significant performance advantage in many use cases.

I'm curious if the situation has changed in 2024? Comment to this answer (and the linked doc) suggests that there have been some optimizations to HASH indexes in PostgreSQL that make them viable. Are they significant?

Main question:

What are the current best practices regarding when to use a HASH index over a B-TREE in Postgres?

Additional considerations:

Are there specific pitfalls to be aware of with HASH indexes, particularly on:

BOOL columns?
Columns with very few unique values (e.g., HASH indexes where only 3 distinct values exist)?

One of the comments in the original thread stated, "In fact, using the index for something that'd match 99% of rows would be massively inefficient, much slower than seqscan." Does this still hold true today, or has the landscape shifted with modern optimizations?

Would love to hear about your experiences and advice regarding HASH indexes in PostgreSQL in 2024!

Solution

The one thing that has changed with hash indexes is that they have become crash safe with PostgreSQL v10. Now you don't have to rebuild the index after an operating system crash.

Other than that, nothing much has changed. Hash indexes continue to receive less love than B-tree indexes, so they usually won't outperform B-tree indexes. If you have a column with many identical values, they are necessarily inefficient, but during my experiments in 2023 I didn't find any realistic case where hash indexes outperformed B-tree indexes.

My advice is to forget about hash indexes.