I'm having some trouble returning the score of a vector search on PostgreSQL. I have created a HNSW index on the columns answer_embedding and answer_keyword_embedding.
In Python, I've written this function to return the 5 most common documents based on a QA answer embedding. It works:
similarity_score = (
(Document.answer_embedding.op('<->')(embedding_array)) +
(Document.answer_keyword_embedding.op('<->')(keyword_embedding_array))
)
query = (
select(Document.doc_id, Document.answer))
.order_by(similarity_score)
.limit(max_neighbors)
)
results = session.execute(query).all()
However, I also want to return the score of how relevant each document is. Whatever I do I see to get the following error message:
Session failed: 'float' object is not subscriptable
The following code snippets all give me the same error:
Session failed: 'float' object is not subscriptable
query = (
select(Document.doc_id, Document.answer,similarity_score.label("similarity"))
.order_by(similarity_score)
.limit(max_neighbors)
)
query = (
select(Document).add_columns(similarity_score.label("similarity"))
.order_by(similarity_score)
.limit(max_neighbors)
)
If I convert it to using a full query text instead, I get the error that the operator "<->"/"<=>" is not recognized.
When I print the query it looks like this.
Query: SELECT "Document"."DocId", "Document"."Answer", ("Document"."AnswerEmbedding" <-> :AnswerEmbedding_1) + ("Document"."AnswerKeywordEmbedding" <-> :AnswerKeywordEmbedding_1) AS similarity
FROM "Document" ORDER BY ("Document"."AnswerEmbedding" <-> :AnswerEmbedding_1) + ("Document"."AnswerKeywordEmbedding" <-> :AnswerKeywordEmbedding_1)
LIMIT :param_1
Since I am able to retrieve the documents, I'm having a hard time understanding why I can't get the score for each document.
Providing traceback below:
File "path/to/database.py", line 142, in retrieve_documents
results = session.execute(query).scalars().all()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/engine/result.py", line 1769, in all
return self._allrows()
^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/engine/result.py", line 548, in _allrows
rows = self._fetchall_impl()
^^^^^^^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/engine/result.py", line 1676, in _fetchall_impl
return self._real_result._fetchall_impl()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/engine/result.py", line 2270, in _fetchall_impl
return list(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/orm/loading.py", line 219, in chunks
fetch = cursor._raw_all_rows()
^^^^^^^^^^^^^^^^^^^^^^
File "path/to/sqlalchemy/engine/result.py", line 541, in _raw_all_rows
return [make_row(row) for row in rows]
^^^^^^^^^^^^^
File "path/to/sqlalchemy/cyextension/resultproxy.pyx", line 22, in sqlalchemy.cyextension.resultproxy.BaseRow.__init__
File "path/to/sqlalchemy/cyextension/resultproxy.pyx", line 79, in sqlalchemy.cyextension.resultproxy._apply_processors
File "path/to/pgvector/sqlalchemy/vector.py", line 33, in process
return Vector._from_db(value)
^^^^^^^^^^^^^^^^^^^^^^
File "path/to/pgvector/utils/vector.py", line 71, in _from_db
return cls.from_text(value).to_numpy().astype(np.float32)
^^^^^^^^^^^^^^^^^^^^
File "path/to/pgvector/utils/vector.py", line 36, in from_text
return cls([float(v) for v in value[1:-1].split(',')])
~~~~~^^^^^^
TypeError: 'float' object is not subscriptable
I solved it by changing the similarity score to this:
similarity_score = (
(Document.answer_embedding.cosine_distance(embedding)) +
(Document.answer_keyword_embedding.cosine_distance(keyword_embedding))
)