We had an issue a couple of weeks back where our text-index query wasn't yielding any results. This was the expression:
"test_column" @@ websearch_to_tsquery('english', "in_search_argument")
We had to update it to:
"test_column_idx" @@ websearch_to_tsquery('english', "in_search_argument")
..where test_column_idx
is defined as:
"test_column_idx" gin (to_tsvector('english'::regconfig, 'test_column'::text))
Our server is 14.6 (with psql 12.16).
However, my development environment complains about this:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column p.test_column_idx does not exist
If I change it back to use the column (see above; it matched just fine in our development environment even if our production didn't), it works again. Our development environment uses server 14.9 (we closed the gap between versions while debugging) and psql 15.4.
What syntax should I be using to maintain compatibility between environments?
Added (thanks @RichardHuxton):
So, SQLAlchemy creates it as an index in development
"test_column_idx" gin (to_tsvector('english'::regconfig, 'test_column'::text))
The SQLAlchemy definition looks like:
sqlalchemy.Index(
'test_column_idx',
sqlalchemy.sql.func.to_tsvector(
sqlalchemy.literal(_TEXT_INDEX_LANGUAGE),
'test_column'),
postgresql_using='gin'),
When I actually deployed it, I had manually added it to production as a column:
test_column_idx | tsvector | | | generated always as (to_tsvector('english'::regconfig, test_column::text)) stored
So, can anyone suggest how I might adjust the SQLAlchemy definition to create the column instead?
That said, I'm confused how index semantics can be installed as both a column and an index, and the situations where one or the other would be preferred. I could use some clarification.
Thank you.
The issue was simple, but the proper flow is also deceptively simple and what I did subtly deviated.
Somehow we had ended-up with a tsvector column and no index, our search function used this column, and to make matters more confusing, I had inadvertently named this column the intended name of the index (idx_product_1
). So, when I was looking for bugs and reviewing the definition of the search function, I had missed this the first couple of times.
Since what we had worked exactly as intended (see where the issue was, below), we were even more confused. We finally figured-out that:
Therefore, we did this to autopopulate the tsvector column:
ALTER TABLE "products" ADD "style_name_vector" tsvector GENERATED ALWAYS AS (
to_tsvector('simple', "internal_sku") || ' ' ||
to_tsvector('english', "human_brand") || ' ' ||
to_tsvector('simple', "style_id") || ' ' ||
to_tsvector('english', "style_name") || ' ' ||
to_tsvector('english', "gender") || ' ' ||
to_tsvector('simple', shopify_product_id::varchar(255))
) STORED;
Because the tsvector column is a space-separated list of word-vector representations, we can simply concatenate a bunch of vectored columns in order to search all of those columns simultaneously. This was an optimization not in the previous design.
Create the index:
CREATE INDEX "idx_product_1" ON "products" USING GIN("style_name_vector");
Do a search (just for a complete example):
analytics=> select internal_sku, style_name from products where style_name_vector @@ websearch_to_tsquery('english', 'trail') limit 5;
internal_sku | style_name
--------------+----------------------------------------
TN100000926 | Mens Paramount Trail Convertible Pants
RE000127387 | Trail 40
PA000010810 | Womens Capilene Cool Trail Shirt
SA000023516 | S Lab trail Running Shirt
MA100000254 | Womens Vapor Trail Hoodie