ruby-on-railspostgresqlfull-text-searchtext-normalization

What is the best way to search for an exact match using Postgres full-text search?


I have a Postgres database with around 1.5 million records. In my Ruby on Rails app, I need to search the statement_text field (which can contain anywhere from 1 to hundreds of words).

My problem: I know I can use the pgSearch gem to create scopes like search_all_words or search_any_words, but I'm uncertain what is the most efficient way to ensure only records with the exact match are returned in the result set.

That is, if I search "Pope Francis", I want it to find only those two words when they're consecutive and in the same order (as opposed to, say, "The pope is named Francis").

So far, I've just combined a GIN index with ILIKE for exact match searches. But given that a GIN index essentially works by storing the exact position of a word in every record, shouldn't there be a more efficient (non-ILIKE) way of ensuring that the search term is an exact match with the field?


Solution

  • Generally speaking, full-text requires word stemming based on language dictionary used, so with With Full-Text search you can use ts_rank() function without stemming and with 'simple' dictionary to determine the relevance of the phrase you are searching for.

    WITH t(v) AS ( VALUES
      ('Test sentence with Pope Francis'),
      ('Test Francis sentence with Pope '),
      ('The pope is named Francis')
    )
    SELECT v,ts_rank(tsv,q) as rank
    FROM t,
        to_tsvector('simple',v) as tsv,
        plainto_tsquery('simple','Pope Francis') AS q;
    

    Result:

                    v                 |   rank    
    ----------------------------------+-----------
     Test sentence with Pope Francis  | 0.0991032
     Test Francis sentence with Pope  | 0.0973585
     The pope is named Francis        | 0.0973585
    (3 rows)
    

    Without full-text search, you can implement just faster ILIKE pattern matching with pg_trgm extension. Example is here.