I have a non-optimized query with ILIKE
filters on 6 columns in the first table, and on one more column in the second table.
There are more than 13 million records in the table. This request is processed in 80 seconds. Which is a very long time. Since ILIKE is used, indexing does not work. How can I optimize my query? Maybe I need to create a FULL TEXT INDEX for all fields?
DDL and test data https://dbfiddle.uk/JCZVw0v6
Version PostgreSQL - 10.21
Query
from "sms"
left join "user_apps" on "user_apps"."id" = "sms"."user_app_id"
where "user_apps"."unique_id" ilike '%search_text%'
or "sms.sender" ilike '%search_text%'
or "sms.message" ilike '%search_text%'
or "sms.msisdn_receiver" ilike '%search_text%'
or "sms"."country_code" ilike '%search_text%'
or CAST(sms.id as VARCHAR(255)) ilike '%search_text%'
or "sms.sim" ilike '%search_text%'
or "sms"."status" ilike '%search_text%'
and "sms.type" = 'sms'
order by "smsId" desc
limit 51 offset 0
Cardinalities - 13 million records
EXPLAIN (ANALYZE,BUFFERS)
EXPLAIN (ANALYZE,BUFFERS) select *, "user_apps"."id" as "uaId", "sms"."id" as "smsId"
from "sms"
left join "user_apps" on "user_apps"."id" = "sms"."user_app_id"
where "user_apps"."unique_id" ilike '%13568034%'
or "sms.sender" ilike '%13568034%'
or "sms.message" ilike '%13568034%'
or "sms.msisdn_receiver" ilike '%13568034%'
or "sms"."country_code" ilike '%13568034%'
or CAST(sms.id as VARCHAR(255)) ilike '%13568034%'
or "sms.sim" ilike '%13568034%'
or "sms"."status" ilike '%13568034%'
and "sms.type" = 'sms'
order by "smsId" desc
limit 51 offset 0
Limit (cost=1000.77..17388.49 rows=51 width=1349) (actual time=35122.310..35359.126 rows=1 loops=1)
Buffers: shared hit=24704683 read=647586
-> Gather Merge (cost=1000.77..1754487.38 rows=5457 width=1349) (actual time=35122.305..35359.116 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=24704683 read=647586
-> Nested Loop Left Join (cost=0.71..1752837.34 rows=1364 width=1341) (actual time=28087.631..28087.696 rows=0 loops=5)
Filter: (((user_apps.unique_id)::text ~~* '%13568034%'::text) OR ((sms.sender)::text ~~* '%13568034%'::text) OR ((sms.message)::text ~~* '%13568034%'::text) OR ((sms.msisdn_receiver)::text ~~* '%13568034%'::text) OR ((sms.country_code)::text ~~* '%13568034%'::text) OR (((sms.id)::character varying(255))::text ~~* '%13568034%'::text) OR ((sms.sim)::text ~~* '%13568034%'::text) OR (((sms.status)::text ~~* '%13568034%'::text) AND ((sms.type)::text = 'sms'::text)))
Rows Removed by Filter: 2695161
Buffers: shared hit=24704683 read=647586
-> Parallel Index Scan Backward using sms_pkey on sms (cost=0.43..618823.77 rows=3410216 width=229) (actual time=0.092..7920.969 rows=2695161 loops=5)
Buffers: shared hit=7796122 read=647570
-> Index Scan using user_apps_pkey on user_apps (cost=0.28..0.29 rows=1 width=1112) (actual time=0.001..0.001 rows=0 loops=13475805)
Index Cond: (id = sms.user_app_id)
Buffers: shared hit=16908561 read=16
Planning time: 9.966 ms
Execution time: 35359.380 ms
The first problem is your outdated Postgres version. Postgres 10 has reached end-of-life late 2022 And you are not even on the latest point-release, which is 10.23. Upgrade to a current version. It will help your case in any case. (But it's not the solution.)
Seems like your query is in conflict with operator precedence. Fix by adding parentheses.
This is awkward:
or CAST(sms.id as VARCHAR(255)) ilike '%search_text%'
varchar(255)
is a misunderstanding. See:
One of these would make more sense:
or sms.id::text LIKE '%search_text%'
or sms.id::text ~ 'search_text'
(Below optimization gets rid of it completely.)
Preliminary:
SELECT *, u.id AS "uaId", s.id AS "smsId"
FROM sms s
LEFT JOIN user_apps u ON u.id = s.user_app_id
WHERE s.type = 'sms'
AND (u.unique_id ILIKE '%13568034%'
OR s.sender ILIKE '%13568034%'
OR s.message ILIKE '%13568034%'
OR s.msisdn_receiver ILIKE '%13568034%'
OR s.country_code ILIKE '%13568034%'
OR s.id::text LIKE '%13568034%' -- !
OR s.sim ILIKE '%13568034%'
OR s.status ILIKE '%13568034%') -- parentheses !!!
ORDER BY s.id DESC
LIMIT 51
OFFSET 0;
Core topics (after fixing broken logic above):
Trigram index on concatenated column of table sms
Trigram index on user_apps.unique_id
Break up query to get rid of "ugly OR
"
Replace SELECT *
with only needed columns in the SELECT
list
Your columns can be null, so concatenation with basic ||
gets tedious quickly. See:
We want concat_ws()
but IMMUTABLE
for the index, so create a custom one. See:
CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE internal IMMUTABLE PARALLEL SAFE AS
'text_concat_ws';
Requires superuser access. If you don't have that, fall back to next best option:
CREATE OR REPLACE FUNCTION immutable_concat_ws(text, VARIADIC text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT concat_ws($1, $2);
$func$;
Use the above function in a trigram index to make this fast. See:
CREATE INDEX sms_multi_trgm_gin_index ON sms USING gin (immutable_concat_ws('|', sender, message, msisdn_receiver, country_code, id::text, sim, status) gin_trgm_ops);
I use '|' as separator. Use any character that's never used in search patterns, to avoid false positives!
And a plain index on user_apps
:
CREATE INDEX user_apps_unique_id_trgm_gin_index ON user_apps USING gin (unique_id gin_trgm_ops);
Reform your query to make use of above indexes
Also break up the ugly OR
that would still kill performance. See:
SELECT *
is typically a pointless waste for the many columns in your tables. Reduce to columns you need returned. I made up a list.
SELECT u.unique_id, s.sender, s.message, s.msisdn_receiver, s.country_code, s.id, s.sim, s.status
, u.id AS "uaId", s.id AS "smsId"
FROM sms s
LEFT JOIN user_apps u ON u.id = s.user_app_id
WHERE s.type = 'sms'
AND immutable_concat_ws('|', s.sender, s.message, s.msisdn_receiver, s.country_code, s.id::text, s.sim, s.status) ~* '135'
UNION
SELECT u.unique_id, s.sender, s.message, s.msisdn_receiver, s.country_code, s.id, s.sim, s.status
, u.id AS "uaId", s.id AS "smsId"
FROM sms s
JOIN user_apps u ON u.id = s.user_app_id -- INNER JOIN !
WHERE s.type = 'sms'
AND u.unique_id ~* '135'
ORDER BY "smsId" DESC
LIMIT 51
OFFSET 0;
Now it should be fast.
Related: