sqlpostgresqltext-searchtsvector

Postgresql, tsquery doesn't work with part of string


I'm using postgres' tsquery function to search in a field that might contain letters in multiple languages and numbers. it seems that in every case the search works up to a part of the searched phrase and stops working until you write the full phrase.

for example: searching for the name '15339' outputs the right row when the search term is '15339' but if it's '153' it won't.

searching for Al-Alamya, if the term is 'al-' it will work and return the row, but adding letters after that, for example, 'al-alam' won't return it until I finish writing the full name ('Al-Alamya').

my query:

SELECT *
FROM (SELECT DISTINCT ON ("consumer_api_spot"."id") "consumer_api_spot"."id",
                                                    "consumer_api_spot"."name",

      FROM "consumer_api_spot"
               INNER JOIN "consumer_api_account" ON ("consumer_api_spot"."account_id" = "consumer_api_account"."id")
               INNER JOIN "users_user" ON ("consumer_api_account"."id" = "users_user"."account_id")

      WHERE (
                    users_user.id = 53 AND consumer_api_spot.active
                    AND
                    "consumer_api_spot"."vectorized_name" @@ tsquery('153')
                )
      GROUP BY "consumer_api_spot"."id"
     ) AS "Q"
LIMIT 50 OFFSET 0

Solution

  • If you check the documentation, you'll find more information about what you can specify as a tsquery. They support grouping, combining using boolean operations, and also prefixing which is probably something you want. An example from the docs:

    Also, lexemes in a tsquery can be labeled with * to specify prefix matching:

    SELECT 'super:*'::tsquery;
    

    This query will match any word in a tsvector that begins with “super”.

    So in your query you should modify the part of tsquery('153') to tsquery('153:*').

    Btw. I don't know exactly how you constructed your database schema, but you can add a tsvector index for a column using a GIN index. I will assume that you generate the "consumer_api_spot"."vectorized_name" column from a "consumer_api_spot"."name" column. If that's the case you can create a tsvector index for that column like this:

    CREATE INDEX gin_name on consumer_api_spot using gin (to_tsvector('english',name))
    

    And then you could change this query:

    "consumer_api_spot"."vectorized_name" @@ tsquery('153')
    

    into this:

    to_tsvector('english', "consumer_api_spot"."name") @@ to_tsquery('english', '153:*')
    

    and get a potential speed benefit, because the query would utilize an index.

    Note about the 'english': You cannot omit the language, when creating the index, but it won't have an effect on queries in other languages, or queries with numbers. However, be careful, the language must be the same for creating the index and performing the query to enable PostgreSQL to use the index.