postgresqlindexing

Why does Postgres not use the index when the comparison value is derived from my function?


There is a table:

create table individual_profile_identity
(
    normalized_document_number varchar,
    normalized_name            varchar,
    birth_date                 date
);

There is an index in it:

create index individual_profile_identity_name_birth_dt_doc_idx
    on individual_profile_identity (
        normalized_name varchar_pattern_ops,
        birth_date,
        normalized_document_number
    );

There is a function that normalizes document_number and name fields:

create function rf_normalize_name(par_name character varying) returns character varying
    language plpgsql
as
$$
BEGIN
    RETURN nullif(upper(translate(par_name, 'A !@#$%^&*()_-+={}[]:;"''<>,.?/|\`~', 'A')), '0');
END
$$;

The query:

explain
select *
from individual_profile_identity
where normalized_name=rf_normalize_name('John Donne')
  and birth_date='2000-01-01'
  and normalized_document_number=rf_normalize_name('1234 567890');

The plan:

Index Scan using individual_profile_identity_name_birth_dt_doc_idx on individual_profile_identity  (cost=0.41..1935.18 rows=1 width=173)
  Index Cond: (birth_date = '2000-01-01'::date)
  Filter: (((normalized_name)::text = (rf_normalize_name('John Donne'::character varying))::text) AND ((normalized_document_number)::text = (rf_normalize_name('1234 567890'::character varying))::text))

As you can see, it uses the index but only with birth_date in it:

Index Cond: (birth_date = '2000-01-01'::date)

It completely doesn't make sense to me why postgres won't first run rf_normalize_name function and only then apply index to all 3 fields

Could anybody please explain the behaviour?


Solution

  • The CREATE FUNCTION statement by default creates a VOLATILE function if you don't specify otherwise. This "indicates that the function value can change even within a single table scan, so no optimizations can be made", forcing Postgres to call the function on every individual row and not being able to use the index.

    You should

    CREATE FUNCTION rf_normalize_name(par_name varchar) RETURNS varchar
        LANGUAGE SQL
        IMMUTABLE
        PARALLEL SAFE
        RETURN nullif(upper(translate(par_name, 'A !@#$%^&*()_-+={}[]:;"''<>,.?/|\`~', 'A')), '0');
    

    then your query will use the index on all three columns.