I have a table that contains ~ 100.000.000 records. In this table, there's a varchar(30) column that contains the license plates of cars. I'm joining this table with another table that contains the same amount of records.
I need to query this table using like with different values, and order the results using a column from the joined table:
select *
from tvxa.ttvx51_msg_81 as ttvx51
left join tvxa.ttvx63_ult_ric as fnl on ttvx51.c_id_msg = fnl.c_id_msg and fnl.c_tip_ric = ('FNL')
where fnl.c_tar ilike any ('{"GD47%", "CD529%", "CX7530_", "GL573S_"}')
order by ttvx51.d_trn desc
limit 10 offset 0;
On the b.d_trn column, there's a standard index (btree) and on the a.c_tar column there's a gist and a standard btree index.
The c_id_msg columns are both the primary key of the two tables.
Here all the indexes created on the two tables:
I'm having performance problems only using some values on the like operation. For example, using this query:
select *
from tvxa.ttvx51_msg_81 as ttvx51
left join tvxa.ttvx63_ult_ric as fnl on ttvx51.c_id_msg = fnl.c_id_msg and fnl.c_tip_ric = ('FNL')
where fnl.c_tar ilike any ('{"AB142%"}')
order by ttvx51.d_trn desc
limit 10 offset 0;
it takes at least 20 minutes.
Performance seems very random, some values take milliseconds and some take minutes. The only correlation that I saw is that the input that takes longer is composed of at least 5 characters with a % at the end of the string.
If I remove the order by clause, it takes only some seconds to perform the query.
Here are query planner results for the query with order by clause and 5 char of input:
Limit (cost=1001.16..59950.24 rows=10 width=274)
-> Gather Merge (cost=1001.16..48020923.65 rows=8146 width=274)
Workers Planned: 2
-> Nested Loop (cost=1.14..48018983.37 rows=3394 width=274)
-> Parallel Index Scan Backward using ittvx51_msg_81_migrazione2 on ttvx51_msg_81 ttvx51 (cost=0.57..22944880.81 rows=37045867 width=204)
-> Index Scan using ittvx63_ult_ric_migrazione on ttvx63_ult_ric fnl (cost=0.57..0.68 rows=1 width=70)
Index Cond: (c_id_msg = ttvx51.c_id_msg)
Filter: (((c_tar)::text ~~* ANY ('{AB142%}'::text[])) AND ((c_tip_ric)::text = 'FNL'::text))
With explain (buffers, analyze)
Limit (cost=1001.16..59951.63 rows=10 width=274) (actual time=453151.960..612564.881 rows=10 loops=1)
Buffers: shared hit=291645856 read=1955824
I/O Timings: read=1106382.770
-> Gather Merge (cost=1001.16..48022052.31 rows=8146 width=274) (actual time=453151.958..612564.875 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=291645856 read=1955824
I/O Timings: read=1106382.770
-> Nested Loop (cost=1.14..48020112.04 rows=3394 width=274) (actual time=388428.681..554079.550 rows=5 loops=3)
Buffers: shared hit=291645856 read=1955824
I/O Timings: read=1106382.770
-> Parallel Index Scan Backward using ittvx51_msg_81_migrazione2 on ttvx51_msg_81 ttvx51 (cost=0.57..22945552.81 rows=37045867 width=204) (actual time=0.057..49202.373 rows=16243236 loops=3)
Buffers: shared hit=48645632 read=645078
I/O Timings: read=91099.170
-> Index Scan using ittvx63_ult_ric_migrazione on ttvx63_ult_ric fnl (cost=0.57..0.68 rows=1 width=70) (actual time=0.030..0.030 rows=0 loops=48729709)
Index Cond: (c_id_msg = ttvx51.c_id_msg)
Filter: (((c_tar)::text ~~* ANY ('{AB142%}'::text[])) AND ((c_tip_ric)::text = 'FNL'::text))
Rows Removed by Filter: 1
Buffers: shared hit=243000224 read=1310746
I/O Timings: read=1015283.600
Planning:
Buffers: shared hit=490
Planning Time: 0.595 ms
Execution Time: 612564.974 ms
Instead, there are query planner results for the query without order by clause and 5 char of input:
Limit (cost=431.81..565.35 rows=10 width=274)
-> Nested Loop (cost=431.81..109212.18 rows=8146 width=274)
-> Bitmap Heap Scan on ttvx63_ult_ric fnl (cost=431.24..34898.28 rows=8687 width=70)
Recheck Cond: ((c_tar)::text ~~* ANY ('{AB142%}'::text[]))
Filter: ((c_tip_ric)::text = 'FNL'::text)
-> Bitmap Index Scan on ittvx63_ult_ric_3 (cost=0.00..429.07 rows=9153 width=0)
Index Cond: ((c_tar)::text ~~* ANY ('{AB142%}'::text[]))
-> Index Scan using ttvx51_msg_81_migrazione_pkey_migrazione on ttvx51_msg_81 ttvx51 (cost=0.57..8.55 rows=1 width=204)
Index Cond: (c_id_msg = fnl.c_id_msg)
With explain (buffers, analyze)
Limit (cost=435.81..569.35 rows=10 width=274) (actual time=253353.646..253361.341 rows=10 loops=1)
Buffers: shared hit=830 read=300091
I/O Timings: read=249287.405
-> Nested Loop (cost=435.81..109216.18 rows=8146 width=274) (actual time=253353.644..253361.334 rows=10 loops=1)
Buffers: shared hit=830 read=300091
I/O Timings: read=249287.405
-> Bitmap Heap Scan on ttvx63_ult_ric fnl (cost=435.24..34902.28 rows=8687 width=70) (actual time=253352.990..253354.078 rows=10 loops=1)
Recheck Cond: ((c_tar)::text ~~* ANY ('{AB142%}'::text[]))
Filter: ((c_tip_ric)::text = 'FNL'::text)
Heap Blocks: exact=10
Buffers: shared hit=789 read=300082
I/O Timings: read=249280.389
-> Bitmap Index Scan on ittvx63_ult_ric_3 (cost=0.00..433.07 rows=9153 width=0) (actual time=253351.558..253351.558 rows=49 loops=1)
Index Cond: ((c_tar)::text ~~* ANY ('{AB142%}'::text[]))
Buffers: shared hit=786 read=300066
I/O Timings: read=249278.022
-> Index Scan using ttvx51_msg_81_migrazione_pkey_migrazione on ttvx51_msg_81 ttvx51 (cost=0.57..8.55 rows=1 width=204) (actual time=0.720..0.720 rows=1 loops=10)
Index Cond: (c_id_msg = fnl.c_id_msg)
Buffers: shared hit=41 read=9
I/O Timings: read=7.016
Planning:
Buffers: shared hit=1157 read=3
I/O Timings: read=3.113
Planning Time: 5.534 ms
Execution Time: 253361.759 ms
And last, here are query planner results for the query with order by clause and 3 char of input:
Limit (cost=1001.16..59950.24 rows=10 width=274)
-> Gather Merge (cost=1001.16..48020923.65 rows=8146 width=274)
Workers Planned: 2
-> Nested Loop (cost=1.14..48018983.37 rows=3394 width=274)
-> Parallel Index Scan Backward using ittvx51_msg_81_migrazione2 on ttvx51_msg_81 ttvx51 (cost=0.57..22944880.81 rows=37045867 width=204)
-> Index Scan using ittvx63_ult_ric_migrazione on ttvx63_ult_ric fnl (cost=0.57..0.68 rows=1 width=70)
Index Cond: (c_id_msg = ttvx51.c_id_msg)
Filter: (((c_tar)::text ~~* ANY ('{AB142%}'::text[])) AND ((c_tip_ric)::text = 'FNL'::text))
Is there a way to improve performance for this type of query? Am I doing something wrong?
I've tried to add different indexes with different algorithms but the results seem the same.
Most of this sad tale is contained in the bitmap line from one of the plans: expected rows=9153, actual rows = 49. So the row estimate is off by about 200 fold. This means that by walking the index backward to fulfill the ORDER BY, it thinks it can find 10 rows and then stop after doing 200 times less work than it actually can. While the plan is should be using, fetching all 49 rows and sorting them, it thinks will be 200 times more work than it actually is. These two mistakes compound, so overall it misestimating the plan costs by about 40,000 fold. Not surprisingly, this leads to the wrong plan being selected.
There is not much you can do about the row estimate. PostgreSQL has no way to know how many rows will actually match the rare LIKE condition. There is a big difference between the assumed 0.0001 selectivity and real 5e-7 selectivity and currently implemented statistics can't help improve that decision. I do have ideas for improving this for the specific case of LIKE/~~, but I don't see those ideas getting implemented soon or ever, and likely to open a whole new can of worms. (Basically have the planner keep shortening the pattern until it finds at least one match in the statistical sample with the shortened pattern, and then use the number of trimmed off characters to extrapolate the real selectivity downward).
You could force it to use the correct plan in two ways. One is to use a MATERIALIZED CTE to force it to fetch all rows meeting the where clause up front, and then apply the sort and the limit outside the CTE. The other is to change the ORDER BY in a way which defeats the index usage while not changing the result, ORDER BY d_trn + interval '0' desc
. But this will force it to use that plan even when it is not the best one, e.g. when the pattern is very unselective. So you might need to detect the pattern length on the client side and make the client choose between which way to formulate the query for best execution.
As a side note, the better plan would probably be much better yet with a GIN index rather than a GiST index for the trigrams. But that isn't going to change the planning issues.