Some of my collegues claimed: Postgres use a simple index little bit less than a converted to text one...
CREATE INDEX simple_index ON my_table(my_varchar);
CREATE INDEX text_index ON my_table((my_varchar::text));
And I wonder that the analyze always convert it to text why?
explain analyze select * from my_table where my_varchar = 'WONDERFUL_VARCHAR';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Scan using simple_index on my_table (cost=0.42..8.92 rows=7 width=1192) (actual time=0.106..0.112 rows=4 loops=1)
Index Cond: ((my_varchar)::text = 'WONDERFUL_VARCHAR'::text)
Planning Time: 0.287 ms
Execution Time: 0.298 ms
I probed it an postgres16 and postgres can use simple varchar index, if I cast the constant variable to text or not.
What is the difference of two indexes? Is there some cases, when text converted index will use postgres and the simple one not?
The text_index
definition isn't valid because you need to duplicate the parentheses to use an expression in that context:
CREATE INDEX text_index ON my_table((my_varchar::text));
This would get you an error:
CREATE INDEX text_index ON my_table(my_varchar::text);
ERROR: syntax error at or near "::" LINE 1: CREATE INDEX text_index ON my_table(my_varchar::text);... ^
There is a very subtle difference between the two and a case to be made against what your colleagues suggest. You might be accidentally disabling index-only scans:
demo at db<>fiddle
explain analyze verbose
select my_varchar
from my_table
where my_varchar/*::text doesn't matter here*/= '0f00d3655515285852baa9c7e0e4490d';
Index Only Scan using simple_index on public.my_table (cost=0.42..4.44 rows=1 width=62) (actual time=0.070..0.072 rows=1 loops=1) |
Output: my_varchar |
Index Cond: (my_table.my_varchar = '0f00d3655515285852baa9c7e0e4490d'::text) |
Heap Fetches: 0 |
If you take away the simple, leaving just the expression-based text_index
, this gets downgraded from Index-Only to a regular Index Scan:
Index Scan using text_index on public.my_table (cost=0.42..8.44 rows=1 width=61) (actual time=0.059..0.061 rows=1 loops=1) |
Output: my_varchar |
Index Cond: ((my_table.my_varchar)::text = '0f00d3655515285852baa9c7e0e4490d'::text) |
Postgres doesn't rely on the index set up on an expression to allow extracting the underlying value from it so even if you're only fetching my_varchar
based exclusively on the my_varchar
and nothing else, you'll see it jump to the heap anyways even though it should be able skip that and give you the value it just found in the index, saving a potentially expensive, needless additional step.
That's unless you explicitly include
the raw value as payload:
CREATE INDEX text_index2 ON my_table((my_varchar::text))INCLUDE(my_varchar);
Which costs you additional space and makes things slower because a larger index is slower to search. Still, reading the value right from the index you had to read anyways, beats having to do an additional jump from that to the table.
Index Only Scan using text_index2 on public.my_table (cost=0.42..4.44 rows=1 width=62) (actual time=0.093..0.094 rows=1 loops=1) |
Output: my_varchar |
Index Cond: (((my_table.my_varchar)::text) = '0f00d3655515285852baa9c7e0e4490d'::text) |
Heap Fetches: 0 |
If you rarely ever do a narrow select like this, it shouldn't matter much because the presence of any other, not indexed and not include
d field will require the jump from the index to the table to get the missing values. Looking at the 1192
width of your rows, you do have more stuff in there, so it might be unlikely you ever saw it do an Index-Only and fitting them all as included payload might exceed the limit.
It might as well fit right under it: here you can see even a slightly wider table getting entirely covered. Worth a shot if it's fairly static.
If you do really follow the anti-pattern of select*
(without explicitly listing field names you need, in the order you need, which is doomed to break as soon as someone adds, swaps, reorders something), consider listing field names explicitly instead, then see if you really need all the columns and if not, whether you could set up indexes covering just the ones you do need. If you can afford the space and some overhead required for maintaining the index, there's some performance to be gained there.