postgresqlfull-text-searchgwt-gin

What's wrong with GIN index, can't avoid SEQ scan?


I've created a table like this,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
CREATE UNIQUE INDEX md5_uniq_idx ON mytable(hash);

When I query the title,

test=# explain analyze select * from mytable where to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..277.35 rows=10 width=83) (actual time=0.111..75.549 rows=10 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83) (actual time=0.110..75.546 rows=10 loops=1)
         Filter: (to_tsvector('english'::regconfig, (title)::text) @@ '''abc'' | ''def'''::tsquery)
         Rows Removed by Filter: 10221
 Planning time: 0.176 ms
 Execution time: 75.564 ms
(6 rows)

The index is not used. Any ideas? I have 10m rows.


Solution

  • There is a typo in your index definition, it should be

    ON mytable USING gin (to_tsvector('english', title))
    

    instead of

    ON mytable USING gin (to_tsvector('english', 'title'))
    

    The way you wrote it, it is a constant and not a field that is indexed, and such an index would indeed be useless for a search like the one you perform.

    To see if an index can be used, you can execute

    SET enable_seqscan=off;
    

    and then run the query again.
    If the index is still not used, the index probably cannot be used.

    In addition to the above, there is something that strikes me as strange with your execution plan. PostgreSQL estimates that a sequential scan of mytable will return 13744 rows and not 10 million as you say there are. Did you disable autovacuum or is there something else that could cause your table statistics to be that inaccurate?