sqlpostgresqlquery-optimizationdatabase-indexestrigram

Postgres Select ILIKE %text% is Slow On Large String Rows


I have a table which have only 7 columns and one of that column stores long text data for every row. Average character length for that text column data is approximately 1500 characters. And this table got 500.000 rows.

When I use a select query and not take that text column with it, there is no problem, query takes 10 secs as expected.

But if I add this long text column to my query like Select * from table_1 it takes 3 or 4 minutes to complete this query and fill datatable with dataadapter.

And why I need to lookup all that long text column records? Because I need to use text filter on it like:

SELECT * 
FROM table_1 
WHERE longtextcolumn ILIKE ANY (ARRAY['%texttosearch1%', '%texttosearch2%'])

What should I do to speed up that progress? Table partitioning could solve this speed issue? Or should I look for indexing?


Solution

  • After reading all answers in this post I started to learn about indexes and especially gin indexing on Postgres. And today I overcome my problem. Before using gin indexing method my query takes 8 mins to complete. Now same query takes 50 ms. It is a dramatic performance difference and I want to explain what I did step by step for the community.

    EXAMPLE

    Let's assume we have a table named table_1 and this table has a column named long_text. That long_text column stores long text data like strings which have length of 1500 characters. And this table_1 has 800.000 rows.

    STEP 1

    Postgres has a datatype which named as ts_vector. This ts_vector data type takes your long text and calculate the single distinct words of it(language based) and index that words in it. So we need to create a ts_vector column on our table_1 and convert our long_text column and populate ts_vector data from it. This will be like:

    ALTER table table_1 
    ADD COLUMN long_text_tsv TS_VECTOR;
    

    STEP 2

    Populate ts_vector data from our long_text column:

    UPDATE table_1 t1
    SET long_text_tsv = to_tsvector('english', t1.long_text)
    FROM table_1 t2;
    

    STEP 3

    Create gin index of that newly populated long_text_tsv column

    CREATE INDEX tsv_index
    ON table_1
    USING gin(long_text_tsv);
    

    After that you are ready to filter your long text data using that newly created and indexed tsv data

    STEP 4

    With tsvector data, your search query should be like this:

    SELECT long_text FROM table_1
    WHERE long_text_tsv @@ to_tsquery('john<->lennon&music');
    

    ts_vector data should be searched by ts_query data type. In above query <-> means followed by, & means 'AND' operator.

    With this method I explained, my search query speeds up 100 times faster than the old one.

    I am not sure if I do this all in the right way but I assume I am doing ok because everything is looking fine in my program now. If something is wrong in that answer, please warn me about that.