I'm testing how join works with hash index in PostgreSQL 16.2. Here is a test table. Just 2 columns with numbers in text format.
create table join_test (
pk varchar(20),
fk varchar(20));
insert into join_test(pk, fk)
select s::varchar(20),
(10000001 - s)::varchar(20)
from generate_series(1, 10000000) as s;
Then I do simple join.
explain analyze select *
from join_test t1
join join_test t2 on t1.pk = t2.fk;
Hash Join (cost=327879.85..878413.95 rows=9999860 width=28) (actual time=5181.056..18337.596 rows=10000000 loops=1)
Hash Cond: ((t1.pk)::text = (t2.fk)::text)
-> Seq Scan on join_test t1 (cost=0.00..154053.60 rows=9999860 width=14) (actual time=0.070..1643.618 rows=10000000 loops=1)
-> Hash (cost=154053.60..154053.60 rows=9999860 width=14) (actual time=5147.801..5147.803 rows=10000000 loops=1)
Buckets: 262144 Batches: 128 Memory Usage: 5691kB
-> Seq Scan on join_test t2 (cost=0.00..154053.60 rows=9999860 width=14) (actual time=0.024..2163.714 rows=10000000 loops=1)
Planning Time: 0.172 ms
Execution Time: 18718.586 ms
No surprises here, without indexes there is a Hash Join with hash table construction on the fly.
Then I add a hash index on "pk" column. And do same join again.
Nested Loop (cost=0.00..776349.75 rows=9999860 width=28) (actual time=0.107..85991.520 rows=10000000 loops=1)
-> Seq Scan on join_test t2 (cost=0.00..154053.60 rows=9999860 width=14) (actual time=0.062..1399.400 rows=10000000 loops=1)
-> Index Scan using join_test_pk_idx on join_test t1 (cost=0.00..0.05 rows=1 width=14) (actual time=0.008..0.008 rows=1 loops=10000000)
Index Cond: ((pk)::text = (t2.fk)::text)
Rows Removed by Index Recheck: 0
Planning Time: 0.195 ms
Execution Time: 86490.687 ms
As I understand it, in this particular case, Nested Loop is not really a "nested loop" and rather algorithmically the same as Hash Join, except that it uses an already constructed hash table from the index.
So what did you do to make this happen? In my hands, the nested loop is understand to be far slower than the hash join. Did you do something silly, like set random_page_cost to zero?
As I understand it, in this particular case, Nested Loop is not really a "nested loop" and rather algorithmically the same as Hash Join
You can play endless word games where everything is "really" something else, but this quickly becomes meaningless where nothing means anything.
The hash join uses sequential scans to read the data and builds a private hash table which is designed to spill to disk efficiently if need be. So it is efficient in both IO and locking. The nested loop uses a shared disk-base structure in which every access might trigger inefficient random IO (which may or may not be fulfilled by cache) and definitely involves some locking and buffer management overhead. If you really want to know the performance details, you could dig into it with perf
, for example.
In my hands once I force everything to stay in main memory, much of the slowness of the nested loop (which is only actually 2x slower for me, though the planner thinks it will be 6 times slower) seems to be due to CPU cache-misses. I think the CPU cache gets some read-ahead benefit, at least on my VM VirtualBox machine, and the hash join is better at achieving this than the disk-based index is. But this might be a quirk of my "hardware". I've found that this type of analysis doesn't transfer well to other hardware. This is based on the perf annotation showing the slowest instruction being something which makes no sense other than as due to a CPU stall.
There is little point in using hash indexes over btree indexes here. And even less point in tricking the planner into thinking the nested loop is actually a good idea for this case.