postgresqlsearchtrigramtextmatching

Match parts of a query with trigram index


I have a table of events where each event has a title and description. Searches should search both columns:

title: Dick(ens) and Jane
description: Meet weekly to discuss classic books!

Given the above, I would like a query of book club to match.

My index:

CREATE INDEX evsearch_idx on events using gist((title || ' ' || description) gist_trgm_ops)

My query:

SELECT * FROM events WHERE title || ' ' || description ILIKE '%book club%'

I think the problem is that I'm fundamentally misunderstanding how an index works and therefore it's attempting to match the entire string book club to various three-letter combinations but I'm not sure how to fix this.


Solution

  • ILIKE will always try to match the whole string. For example, to match ILIKE '%book club%', a string needs to include the whole string book club.

    If you want to use the similarity matching potential of pg_trm, you need to use the matching operators for which it was designed. Taken from the doc:

    1. % returns true if its arguments have a similarity that is greater than the current similarity threshold set by pg_trgm.similarity_threshold.
    2. <% returns true if the similarity between the trigram set in the first argument and a continuous extent of an ordered trigram set in the second argument is greater than the current word similarity threshold set by pg_trgm.word_similarity_threshold parameter.

    Side note #1: Make it case insensitive by adding lower(...) in your index definition and in your queries.

    Side note #2: title || ' ' || description will return NULL if title or description is NULL. If you want to guard against one of the 2 values being NULL, create the index on COALESCE(title, '') || ' ' || COALESCE(description, '')

    Side note #3: pg_trgm can only do so much. For really complex queries, you might want to look into full text search