There is a table:
create table individual_profile_identity
(
normalized_document_number varchar,
normalized_name varchar,
birth_date date
);
There is an index in it:
create index individual_profile_identity_name_birth_dt_doc_idx
on individual_profile_identity (
normalized_name varchar_pattern_ops,
birth_date,
normalized_document_number
);
There is a function that normalizes document_number
and name
fields:
create function rf_normalize_name(par_name character varying) returns character varying
language plpgsql
as
$$
BEGIN
RETURN nullif(upper(translate(par_name, 'A !@#$%^&*()_-+={}[]:;"''<>,.?/|\`~', 'A')), '0');
END
$$;
The query:
explain
select *
from individual_profile_identity
where normalized_name=rf_normalize_name('John Donne')
and birth_date='2000-01-01'
and normalized_document_number=rf_normalize_name('1234 567890');
The plan:
Index Scan using individual_profile_identity_name_birth_dt_doc_idx on individual_profile_identity (cost=0.41..1935.18 rows=1 width=173)
Index Cond: (birth_date = '2000-01-01'::date)
Filter: (((normalized_name)::text = (rf_normalize_name('John Donne'::character varying))::text) AND ((normalized_document_number)::text = (rf_normalize_name('1234 567890'::character varying))::text))
As you can see, it uses the index but only with birth_date
in it:
Index Cond: (birth_date = '2000-01-01'::date)
It completely doesn't make sense to me why postgres won't first run rf_normalize_name
function and only then apply index to all 3 fields
Could anybody please explain the behaviour?
The CREATE FUNCTION
statement by default creates a VOLATILE
function if you don't specify otherwise. This "indicates that the function value can change even within a single table scan, so no optimizations can be made", forcing Postgres to call the function on every individual row and not being able to use the index.
You should
CREATE FUNCTION rf_normalize_name(par_name varchar) RETURNS varchar
LANGUAGE SQL
IMMUTABLE
PARALLEL SAFE
RETURN nullif(upper(translate(par_name, 'A !@#$%^&*()_-+={}[]:;"''<>,.?/|\`~', 'A')), '0');
then your query will use the index on all three columns.