postgresqlindexingtext

Postgres index usage on varchar columns


Some of my collegues claimed: Postgres use a simple index little bit less than a converted to text one...

CREATE INDEX simple_index ON my_table(my_varchar);

CREATE INDEX text_index ON my_table((my_varchar::text));

And I wonder that the analyze always convert it to text why?

explain analyze select * from my_table where my_varchar = 'WONDERFUL_VARCHAR';

                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using simple_index on my_table  (cost=0.42..8.92 rows=7 width=1192) (actual time=0.106..0.112 rows=4 loops=1)
   Index Cond: ((my_varchar)::text = 'WONDERFUL_VARCHAR'::text)
 Planning Time: 0.287 ms
 Execution Time: 0.298 ms

I probed it an postgres16 and postgres can use simple varchar index, if I cast the constant variable to text or not.

What is the difference of two indexes? Is there some cases, when text converted index will use postgres and the simple one not?


Solution

  • The text_index definition isn't valid because you need to duplicate the parentheses to use an expression in that context:

    CREATE INDEX text_index ON my_table((my_varchar::text));
    

    This would get you an error:

    CREATE INDEX text_index ON my_table(my_varchar::text);
    
    ERROR:  syntax error at or near "::"
    LINE 1: CREATE INDEX text_index ON my_table(my_varchar::text);...
                                                          ^
    

    There is a very subtle difference between the two and a case to be made against what your colleagues suggest. You might be accidentally disabling index-only scans:
    demo at db<>fiddle

    explain analyze verbose 
    select my_varchar 
    from my_table 
    where my_varchar/*::text doesn't matter here*/= '0f00d3655515285852baa9c7e0e4490d';
    
    Index Only Scan using simple_index on public.my_table (cost=0.42..4.44 rows=1 width=62) (actual time=0.070..0.072 rows=1 loops=1)
    Output: my_varchar
    Index Cond: (my_table.my_varchar = '0f00d3655515285852baa9c7e0e4490d'::text)
    Heap Fetches: 0

    If you take away the simple, leaving just the expression-based text_index, this gets downgraded from Index-Only to a regular Index Scan:

    Index Scan using text_index on public.my_table (cost=0.42..8.44 rows=1 width=61) (actual time=0.059..0.061 rows=1 loops=1)
    Output: my_varchar
    Index Cond: ((my_table.my_varchar)::text = '0f00d3655515285852baa9c7e0e4490d'::text)

    Postgres doesn't rely on the index set up on an expression to allow extracting the underlying value from it so even if you're only fetching my_varchar based exclusively on the my_varchar and nothing else, you'll see it jump to the heap anyways even though it should be able skip that and give you the value it just found in the index, saving a potentially expensive, needless additional step.
    That's unless you explicitly include the raw value as payload:

    CREATE INDEX text_index2 ON my_table((my_varchar::text))INCLUDE(my_varchar);
    

    Which costs you additional space and makes things slower because a larger index is slower to search. Still, reading the value right from the index you had to read anyways, beats having to do an additional jump from that to the table.

    Index Only Scan using text_index2 on public.my_table (cost=0.42..4.44 rows=1 width=62) (actual time=0.093..0.094 rows=1 loops=1)
    Output: my_varchar
    Index Cond: (((my_table.my_varchar)::text) = '0f00d3655515285852baa9c7e0e4490d'::text)
    Heap Fetches: 0

    If you rarely ever do a narrow select like this, it shouldn't matter much because the presence of any other, not indexed and not included field will require the jump from the index to the table to get the missing values. Looking at the 1192 width of your rows, you do have more stuff in there, so it might be unlikely you ever saw it do an Index-Only and fitting them all as included payload might exceed the limit.
    It might as well fit right under it: here you can see even a slightly wider table getting entirely covered. Worth a shot if it's fairly static.

    If you do really follow the anti-pattern of select* (without explicitly listing field names you need, in the order you need, which is doomed to break as soon as someone adds, swaps, reorders something), consider listing field names explicitly instead, then see if you really need all the columns and if not, whether you could set up indexes covering just the ones you do need. If you can afford the space and some overhead required for maintaining the index, there's some performance to be gained there.