How to force PostgreSQL to use index on joined table column?

I have two index :

player_team_id_index on player(team_id)
team_active_index on team(active)

I want to get all players belonging to an active team.

Query 1

explain analyze 
select p.* 
from player p 
inner join team t on t.id = p.team_id
where t.active is true;

This query do not use my index, it does a sequential scan on player table (10 000 rows).

Query 2

explain analyze 
select p.*
from player p
where p.team_id in (select id from team t where t.active is true);

Same result as the first query, it does a sequential scan on player table (10 000 rows) without using player_team_id_idx index.

Query 3

select t.id from team t where t.active is true;

There is only one active team, the result is 1

explain analyze 
select p.*
from player p
where p.team_id in (1);

This query uses player_team_id_idx index which results in much faster execution time.

How can I force PostgreSQL to use player_team_id_idx for Query 1 and Query 2 ? (I'm using PostgreSQL 16.9)

Solution

Make sure to analyze (and possibly vacuum too), or let the autovacuum daemon do it:

ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

Your indexes can be optimised but they are already useful and in my test, vacuum analyze immediately fixes the problem, equipping the planner with the stats it needs to consider the index scans: _{demo at db<>fiddle}

explain analyze 
select p.* 
from player p 
inner join team t on t.id = p.team_id
where t.active is true;

QUERY PLAN
Hash Join (cost=512.00..2347.04 rows=40000 width=532) (actual time=2.466..19.350 rows=95 loops=1)
Hash Cond: (p.team_id = t.id)
-> Seq Scan on player p (cost=0.00..1625.00 rows=80000 width=532) (actual time=0.012..9.056 rows=80000 loops=1)
-> Hash (cost=387.00..387.00 rows=10000 width=8) (actual time=2.050..2.052 rows=24 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 129kB
-> Seq Scan on team t (cost=0.00..387.00 rows=10000 width=8) (actual time=0.115..2.042 rows=24 loops=1)
Filter: (active IS TRUE)
Rows Removed by Filter: 19976
Planning Time: 0.598 ms
Execution Time: 19.389 ms

vacuum analyze team;
vacuum analyze player;

explain analyze 
select p.* 
from player p 
inner join team t on t.id = p.team_id
where t.active is true;

QUERY PLAN
Nested Loop (cost=0.58..470.89 rows=96 width=49) (actual time=0.018..0.194 rows=95 loops=1)
-> Index Scan using team_active_index on team t (cost=0.29..9.23 rows=24 width=8) (actual time=0.009..0.031 rows=24 loops=1)
Index Cond: (active = true)
-> Index Scan using player_team_id_index on player p (cost=0.29..19.20 rows=4 width=49) (actual time=0.003..0.006 rows=4 loops=24)
Index Cond: (team_id = t.id)
Planning Time: 0.446 ms
Execution Time: 0.220 ms

Consider a compound, multiple-key-column index create index on team(active,team_id);, or a partial index create index on team(id)where(active); and move t.active from where to join..on condition list:

drop index team_active_index;
create index team_active_index on team(active,id);

vacuum analyze team;

explain analyze 
select p.* 
from player p 
inner join team t on t.id = p.team_id and t.active;

QUERY PLAN
Nested Loop (cost=0.58..466.37 rows=96 width=49) (actual time=0.019..0.176 rows=95 loops=1)
-> Index Only Scan using team_active_index on team t (cost=0.29..4.71 rows=24 width=8) (actual time=0.009..0.013 rows=24 loops=1)
Index Cond: (active = true)
Heap Fetches: 0
-> Index Scan using player_team_id_index on player p (cost=0.29..19.20 rows=4 width=49) (actual time=0.003..0.006 rows=4 loops=24)
Index Cond: (team_id = t.id)
Planning Time: 0.260 ms
Execution Time: 0.216 ms

Note the partial index doesn't need to use (active is TRUE) as a filter or an index scan condition, because the db knows from the predicate all tuples in this index meet that condition. This index is also smaller.

drop index team_active_index;
create unique index team_active_index on team(id)where(active);

vacuum analyze team;

explain analyze 
select p.* 
from player p 
inner join team t on t.id = p.team_id and t.active;

QUERY PLAN
Nested Loop (cost=0.43..470.16 rows=96 width=49) (actual time=0.014..0.161 rows=95 loops=1)
-> Index Only Scan using team_active_index on team t (cost=0.14..8.50 rows=24 width=8) (actual time=0.005..0.008 rows=24 loops=1)
Heap Fetches: 0
-> Index Scan using player_team_id_index on player p (cost=0.29..19.20 rows=4 width=49) (actual time=0.003..0.005 rows=4 loops=24)
Index Cond: (team_id = t.id)
Planning Time: 0.260 ms
Execution Time: 0.184 ms