sqlpostgresqlindexingpattern-matchingpostgresql-performance

How to search for the acronym of a string column in an indexable way


I'm working on an express.js project, using Sequelize as the orm, and postgres as the engine.

I'm working on improving the performance of a slow select query that does many things, and one of them is comparing the acronym of the name of the warehouse against a search term using ilike, so if the search term is "STW", a warehouse named "Super Test Warehouse" will match. The query looks like this:

SELECT field1, field2, ...
FROM package p
LEFT OUTER JOIN warehouse w ON p.warehouse_id = w.id
-- many other joins
WHERE
    regexp_replace(w.name, '([a-z ])+', '', 'g') ILIKE '%search term%'
    -- many other conditions

The thing is, the main improvement I'm doing is creating indexes (gin for everything that does ilike '%something%'), and refactoring the query to be able to use them. But the regexp_replace() function doesn't work with indexes. Another important thing to note is that this solution only works if all the words in the names are correctly capitalized, which is the case with most of our data, but not all, so if there's a warehouse named "Super TEST Warehouse", or "super test Warehouse", it won't work correctly.

A simple solution I thought of, is to do this (assuming the search text is "STW"):

w.name ILIKE 'S% T% W%' OR w.name ILIKE '% S% T% W%'

It works on the average case, and it uses the gin index I created. But the problem is that it allows other words in between. So "STW" would also match "Super Fake Test Warehouse".

Is there any condition that would match the acronym correctly, and would use an index?

Would you recommend using a generated column for this (assuming I improve the existing regexp_replace, or use something else)? Are there any downsides of doing that? Ideally, I don't want to alter a table only for this feature, but could be a good solution if there's nothing else we can do.

Any other solutions?


Solution

  • What I'm doing in the end is making a generated column like this:

    ALTER TABLE warehouse
    ADD COLUMN name_acronym TEXT GENERATED ALWAYS AS (
        array_to_string(
            array(
                SELECT substring(word FROM 1 FOR 1)
                FROM unnest(string_to_array(name, ' ')) AS word
            ),
            ''
        )
    ) STORED;
    

    I also have this index for name:

    CREATE INDEX warehouse_name ON warehouse USING gin ("name" gin_trgm_ops);
    

    First, this solution is more accurate for the acronyms, since it actually takes the first letter of each word. And also, ilike queries against the generated column will make use the index.