sqlpostgresqlsearchpostgresql-performance

How to make a query with ILIKE filters on multiple columns in two tables?


I have a non-optimized query with ILIKE filters on 6 columns in the first table, and on one more column in the second table.

There are more than 13 million records in the table. This request is processed in 80 seconds. Which is a very long time. Since ILIKE is used, indexing does not work. How can I optimize my query? Maybe I need to create a FULL TEXT INDEX for all fields?

  1. DDL and test data https://dbfiddle.uk/JCZVw0v6

  2. Version PostgreSQL - 10.21

  3. Query

from "sms" 
left join "user_apps" on "user_apps"."id" = "sms"."user_app_id" 
where "user_apps"."unique_id" ilike '%search_text%'
or "sms.sender" ilike '%search_text%'
or "sms.message" ilike '%search_text%'
or "sms.msisdn_receiver" ilike '%search_text%'
or "sms"."country_code" ilike '%search_text%'
or CAST(sms.id as VARCHAR(255)) ilike '%search_text%'
or "sms.sim" ilike '%search_text%' 
or "sms"."status" ilike '%search_text%'
and "sms.type" = 'sms' 
order by "smsId" desc 
limit 51 offset 0
  1. Cardinalities - 13 million records

  2. EXPLAIN (ANALYZE,BUFFERS)

EXPLAIN (ANALYZE,BUFFERS) select *, "user_apps"."id" as "uaId", "sms"."id" as "smsId"
from "sms"
    left join "user_apps" on "user_apps"."id" = "sms"."user_app_id"
where "user_apps"."unique_id" ilike '%13568034%'
    or "sms.sender" ilike '%13568034%'
    or "sms.message" ilike '%13568034%'
    or "sms.msisdn_receiver" ilike '%13568034%'
    or "sms"."country_code" ilike '%13568034%'
    or CAST(sms.id as VARCHAR(255)) ilike '%13568034%'
    or "sms.sim" ilike '%13568034%'
    or "sms"."status" ilike '%13568034%'
    and "sms.type" = 'sms'
    order by "smsId" desc
    limit 51 offset 0
Limit  (cost=1000.77..17388.49 rows=51 width=1349) (actual time=35122.310..35359.126 rows=1 loops=1)
  Buffers: shared hit=24704683 read=647586
  ->  Gather Merge  (cost=1000.77..1754487.38 rows=5457 width=1349) (actual time=35122.305..35359.116 rows=1 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        Buffers: shared hit=24704683 read=647586
        ->  Nested Loop Left Join  (cost=0.71..1752837.34 rows=1364 width=1341) (actual time=28087.631..28087.696 rows=0 loops=5)
              Filter: (((user_apps.unique_id)::text ~~* '%13568034%'::text) OR ((sms.sender)::text ~~* '%13568034%'::text) OR ((sms.message)::text ~~* '%13568034%'::text) OR ((sms.msisdn_receiver)::text ~~* '%13568034%'::text) OR ((sms.country_code)::text ~~* '%13568034%'::text) OR (((sms.id)::character varying(255))::text ~~* '%13568034%'::text) OR ((sms.sim)::text ~~* '%13568034%'::text) OR (((sms.status)::text ~~* '%13568034%'::text) AND ((sms.type)::text = 'sms'::text)))
              Rows Removed by Filter: 2695161
              Buffers: shared hit=24704683 read=647586
              ->  Parallel Index Scan Backward using sms_pkey on sms  (cost=0.43..618823.77 rows=3410216 width=229) (actual time=0.092..7920.969 rows=2695161 loops=5)
                    Buffers: shared hit=7796122 read=647570
              ->  Index Scan using user_apps_pkey on user_apps  (cost=0.28..0.29 rows=1 width=1112) (actual time=0.001..0.001 rows=0 loops=13475805)
                    Index Cond: (id = sms.user_app_id)
                    Buffers: shared hit=16908561 read=16
Planning time: 9.966 ms
Execution time: 35359.380 ms

Solution

  • Postgres version

    The first problem is your outdated Postgres version. Postgres 10 has reached end-of-life late 2022 And you are not even on the latest point-release, which is 10.23. Upgrade to a current version. It will help your case in any case. (But it's not the solution.)

    Fix your query first

    Seems like your query is in conflict with operator precedence. Fix by adding parentheses.

    This is awkward:

    or CAST(sms.id as VARCHAR(255)) ilike '%search_text%'
    

    varchar(255) is a misunderstanding. See:

    One of these would make more sense:

    or sms.id::text LIKE '%search_text%'
    or sms.id::text ~ 'search_text'
    

    (Below optimization gets rid of it completely.)

    Preliminary:

    SELECT *, u.id AS "uaId", s.id AS "smsId"
    FROM   sms s
    LEFT   JOIN user_apps u ON u.id = s.user_app_id
    WHERE  s.type = 'sms'
    AND   (u.unique_id       ILIKE '%13568034%'
        OR s.sender          ILIKE '%13568034%'
        OR s.message         ILIKE '%13568034%'
        OR s.msisdn_receiver ILIKE '%13568034%'
        OR s.country_code    ILIKE '%13568034%'
        OR s.id::text         LIKE '%13568034%'  -- !
        OR s.sim             ILIKE '%13568034%'
        OR s.status          ILIKE '%13568034%')  -- parentheses !!!
    ORDER  BY s.id DESC
    LIMIT  51
    OFFSET 0;
    

    Rewrite for performance

    Core topics (after fixing broken logic above):

    Auxiliary function

    Your columns can be null, so concatenation with basic || gets tedious quickly. See:

    We want concat_ws() but IMMUTABLE for the index, so create a custom one. See:

    CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
      RETURNS text
      LANGUAGE internal IMMUTABLE PARALLEL SAFE AS
    'text_concat_ws';
    

    Requires superuser access. If you don't have that, fall back to next best option:

    CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
      RETURNS text
      LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
    $func$
    SELECT concat_ws($1, $2);
    $func$;
    

    Trigram indexes

    Use the above function in a trigram index to make this fast. See:

    CREATE INDEX sms_multi_trgm_gin_index ON sms USING gin (immutable_concat_ws('|', sender, message, msisdn_receiver, country_code, id::text, sim, status) gin_trgm_ops);
    

    I use '|' as separator. Use any character that's never used in search patterns, to avoid false positives!

    And a plain index on user_apps:

    CREATE INDEX user_apps_unique_id_trgm_gin_index ON user_apps USING gin (unique_id gin_trgm_ops);
    

    Query

    Reform your query to make use of above indexes

    Also break up the ugly OR that would still kill performance. See:

    SELECT * is typically a pointless waste for the many columns in your tables. Reduce to columns you need returned. I made up a list.

    SELECT u.unique_id, s.sender, s.message, s.msisdn_receiver, s.country_code, s.id, s.sim, s.status
         , u.id AS "uaId", s.id AS "smsId"
    FROM   sms s
    LEFT   JOIN user_apps u ON u.id = s.user_app_id
    WHERE  s.type = 'sms'
    AND    immutable_concat_ws('|', s.sender, s.message, s.msisdn_receiver, s.country_code, s.id::text, s.sim, s.status) ~* '135'
    
    UNION
    SELECT u.unique_id, s.sender, s.message, s.msisdn_receiver, s.country_code, s.id, s.sim, s.status
         , u.id AS "uaId", s.id AS "smsId"
    FROM   sms s
    JOIN   user_apps u ON u.id = s.user_app_id  -- INNER JOIN !
    WHERE  s.type = 'sms'
    AND    u.unique_id ~* '135'
       
    ORDER  BY "smsId" DESC
    LIMIT  51
    OFFSET 0;
    

    fiddle

    Now it should be fast.

    Related: