I am trying to do semantic search with sentence transformer and faiss.
I am able to generate emebdding from corpus and perform query with the query xq
.
But what are t
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("flax-sentence-embeddings/st-codesearch-distilroberta-base")
def get_embeddings(code_snippets: str):
return model.encode(code_snippets)
def build_vector_database(atlas_datapoints):
dimension = 768 # dimensions of each vector
corpus = ["tom loves candy",
"this is a test"
"hello world"
"jerry loves programming"]
code_snippet_emddings = get_embeddings(corpus)
print(code_snippet_emddings.shape)
d = code_snippet_emddings.shape[1]
index = faiss.IndexFlatL2(d)
print(index.is_trained)
index.add(code_snippet_emddings)
print(index.ntotal)
k = 2
xq = model.encode(["jerry loves candy"])
D, I = index.search(xq, k) # search
print(I)
print(D)
This code returns
[[0 1]]
[[1.3480902 1.6274161]]
But I cant find which sentence xq
is matching with and not the matching scores only.
How can I find the top-N matching string from the corpus.
To retrieve the query results, try something like this using the variables from your code.
[corpus[I] for i in I]
But if you have corpus as a np.array
object, you can do some cool slicing like this:
import numpy as np
# If you corpus are in array form.
corpus = np.array(['abc def', 'foo bar', 'bar bar sheep'])
# And indices can be list of integers.
indices = [1,0]
# Results.
corpus[indices]
And it can get a little cooler if your indices are already np.array, like output of faiss, and if you have 2 queries with 1x2xk
results:
import numpy as np
corpus = np.array(['abc def', 'foo bar', 'bar bar sheep'])
indices = np.array([[1,0], [0,2]])
corpus[indices]
The faiss.IndexFlatL2
object returns these through the search()
function:
I
in your code snippet refers to indices of the top-K resultsD
in your code snippet referring to the distance of the top-K results from your query string.Since you have only 1 query, the n=1
, therefore your I
and D
matrice are of size 1x1xk
.
See also: