postgresqlsimilaritypostgresql-10trigram

Postgres similarity function not appropriately using trigram index


I have a simple person table with a last_name column that I've added a GIST index with

CREATE INDEX last_name_idx ON person USING gist (last_name gist_trgm_ops);

According to the docs at https://www.postgresql.org/docs/10/pgtrgm.html, the <-> operator should utilize this index. However, when I actually try to use this difference operator using this query:

explain verbose select * from person where last_name <-> 'foobar' > 0.5;

I get this back:

Seq Scan on public.person  (cost=0.00..290.82 rows=4485 width=233)
  Output: person_id, first_name, last_name
  Filter: ((person.last_name <-> 'foobar'::text) < '0.5'::double precision)

And it doesn't look like the index is being used. However, if I use the % operator with this command:

explain verbose select * from person where last_name % 'foobar';

It seems to use the index:

Bitmap Heap Scan on public.person  (cost=4.25..41.51 rows=13 width=233)
  Output: person_id, first_name, last_name
  Recheck Cond: (person.last_name % 'foobar'::text)
  ->  Bitmap Index Scan on last_name_idx  (cost=0.00..4.25 rows=13 width=0)
        Index Cond: (person.last_name % 'foobar'::text)

I also noticed that if I move the operator to the select portion of the query, the index gets ignored again:

explain verbose select last_name % 'foobar' from person;

Seq Scan on public.person  (cost=0.00..257.19 rows=13455 width=1)
  Output: (last_name % 'foobar'::text)

Am I missing something obvious about how the similarity function uses the trigram index?

I am using Postgres 10.5 on OSX.

EDIT 1

As per Laurenz's suggestion, I tried setting enable_seqscan = off but unfortunately, the query with the <-> operator still seems to ignore the index.

show enable_seqscan;
 enable_seqscan
----------------
 off

explain verbose select * from person where last_name <-> 'foobar' < 0.5;

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.person  (cost=10000000000.00..10000000290.83 rows=4485 width=233)
   Output: person_id, first_name, last_name
   Filter: ((person.last_name <-> 'foobar'::text) < '0.5'::double precision)

Solution

  • This behavior is normal for all kinds of indexes.

    The first query is not in a form that can use the index. For that, a condition would have to be of the form

    <indexed expression> <operator supported by the index> <quasi-constant>
    

    where the last expressions remains constant for the duration of the index scan and the operator returns a boolean value. Your expression ´last_name <-> 'foobar' > 0.5` is not of that form.

    The <-> operator has to be used in an ORDER BY clause to be able to use the index.

    The third query doesn't use the index because the query affects all rows of the table. An index does not speed up the evaluation of an expression, it is only useful to quickly identify a subset of the table (or to get rows in a certain sort order).