postgresqlindexingtrigram

What is the correct way to create a case-insensitive trigram-index in postgres?


...and is it something I should do anyway?

From my brief testing, making a trigram index and searching using

where name like '%query%'

is faster than

where name ilike '%query%'

So it seems like I should do it, but I'm surprised I've not been able to find out how.

(My test data is fairly homogenous - 1.5M rows made up of 16 entries repeated. I can imagine this might mess with the results.)

This is how I expected it to work (note the lower(name)):

create extension pg_trgm;

create table users(name text);

insert into users values('Barry');

create index "idx" on users using gin (lower(name) gin_trgm_ops);

select count(*) from users where (name like '%bar%');

but this returns 0.

Either of

select count(*) from users where (name like '%Bar%');

or

select count(*) from users where (name ilike '%bar%');

work, which makes me believe the trigrams in the index are not lower()'d. Am I misunderstanding how this works under the hood? Is it not possible to call lower there?

I note that this

select show_trgm('Barry')

returns lowercase trigrams:

{"  b"," ba",arr,bar,rry,"ry "}

So I'm perplexed.


Solution

  • The trigrams are definitely lower case.

    The conundrum becomes cleared up when you consider how trigram indexes are used: they act as a filter that eliminates the majority of non-matches, but allow false positive results (among other reasons is their case insensitivity). That's why there always has to be a recheck to eliminate those false positives, and that us why you always get a bitmap index scan.

    The ILIKE query may be slower because it has more results, or because case insensitive comparisons require more effort.