node.jspostgresqlelasticsearchsequelize.jsfull-text-search

Best alternatives for a fast search on a big table on a Postgres database?


We have an Express API that connects to a Postgres database using Sequelize.

One of our endpoints returns a list of sales, which accepts different query parameters like page, column order, filters, and a search term.

This endpoint performs two queries, one to get a single page of results, given the parameters given, and another to get the count of all elements given the same parameters. Both queries have the same from and where clauses.

Most of the query is exact matches (column = value), but when the search term is provided, 10 ilikes conditions are added to the query:

We're testing with the client that has the most data (100k rows aprox), and when those ilikes are not in the query, the endpoint usually takes between 1 and 2 seconds, which is acceptable. But if they are in the query, it takes like 6 seconds. The fact that we do 2 queries makes the difference bigger.

The raw query takes around 600 ms without the ilikes, and around 3 seconds with them. If I do explain analyze to the query, it shows that the ilikes are what take most of the time, especially the one with the concatenation (Client's first and last name).

There are some simple stuff we can do, like reduce the amount of columns we search on (we're evaluating if all of them are necessary), stop doing the acronym thing, and do something else instead of that concat for the full name, like the || operator (which I'm trying to do on Sequelize). Also we should use indexes.

But other than that, what would you recommend to do? I'm evaluating using Elasticsearch, but it may be overkill and needs time to set up and maintain.

I also read about Full Search Text on Postgres, which would make searching much faster, and would give us more advanced search features (similar to Elasticsearch), would you recommend it? I think you would pair this with gist or gin indexes, right?

Any other recommendation?


Solution

  • For this problem, I would probably start by evaluating which of the operations is taking the most time. It may be the case that the concatenation and ILIKE clauses are taking up time, or it may be the case that your storage is loading the data into RAM slow. It's very difficult to tell from just a description of the query.

    To that end, I highly recommend using Postgres' query plan analysis tools. (https://www.postgresql.org/docs/current/sql-explain.html)

    EXPLAIN ANALYZE will give you a query plan (the internal execution steps that postgres is taking), which you can then copy-paste into a query plan visualizer like dalibo (https://explain.dalibo.com/).

    Once you've done that, you can see plainly which exact part of your query is taking the most time.

    That said, if you haven't indexed the columns that you are searching on, that is step 0. Everything after indexing is an afterthought. Correct indexing will give you 90%+ time saved on many queries. Everything else is on the order of 5 or 10%.

    Note that a regular b-tree postgres index on a string column will only actually be used by the postgres query planner if certain conditions are met. Most importantly, it will only be used when searching for substrings that are at the start of the overall string.

    To get around this, you will need to use a postgres full text search index. The choice of GIN or GIST indexes is very very minor when compared to the time save of simply having an index at all.