My table looks like:
create table invoices
(
id serial not null,
data jsonb,
modified date,
search_string text not null
);
I need to search the table with ILIKE
on search_string
.
There may be many different search queries in the same request.
My query looks like:
SELECT *
FROM invoices
WHERE (
search_string ILIKE '%1%'
OR search_string ILIKE '%2%'
OR search_string ILIKE '%3%'
)
Explain for searching without index
Seq Scan on invoices (cost=0.00..147139.51 rows=1004406 width=1006) (actual time=0.038..2341.489 rows=1004228 loops=1)
Filter: ((search_string ~~* '%1%'::text) OR (search_string ~~* '%2%'::text) OR (search_string ~~* '%3%'::text))
Rows Removed by Filter: 1943
Planning Time: 4.682 ms
Execution Time: 2427.400 ms
I tried to make it faster by creating GIN index:
CREATE EXTENSION pg_trgm;
CREATE INDEX invoices_search_string_trigram_index ON invoices USING gin (search_string gin_trgm_ops);
Explain for searching with index
Bitmap Heap Scan on invoices_invoice (cost=414767.41..561902.40 rows=1004149 width=1006) (actual time=14878.331..17862.840 rows=1004228 loops=1)
Recheck Cond: ((search_string ~~* '%1%'::text) OR (search_string ~~* '%2%'::text) OR (search_string ~~* '%3%'::text))
Rows Removed by Index Recheck: 1943
Heap Blocks: exact=63341 lossy=66186
-> BitmapOr (cost=414767.41..414767.41 rows=1006171 width=0) (actual time=14842.199..14842.199 rows=0 loops=1)
-> Bitmap Index Scan on trgm_idx_search_string (cost=0.00..137979.36 rows=874048 width=0) (actual time=4520.466..4520.466 rows=546232 loops=1)
Index Cond: (search_string ~~* '%1%'::text)
-> Bitmap Index Scan on trgm_idx_search_string (cost=0.00..138208.03 rows=904538 width=0) (actual time=4357.453..4357.453 rows=546232 loops=1)
Index Cond: (search_string ~~* '%2%'::text)
-> Bitmap Index Scan on trgm_idx_search_string (cost=0.00..137826.91 rows=853721 width=0) (actual time=5964.276..5964.276 rows=546232 loops=1)
Index Cond: (search_string ~~* '%3%'::text)
Planning Time: 1.198 ms
Execution Time: 17971.102 ms
Why are my index searches slower than seq scan?
Is there any way to make this type of search faster?
Your problem are probably the 66186 lossy blocks. Increase work_mem
until you have only exact blocks.
Considering that you have a million result rows, I'd say that this query will never be very fast unless you reduce the number to result rows.