according to the faiss wiki page (link), you should be able to use SearchParameters to selectively include or exclude ids in a search. Now the information there is a bit strange, because the field "sel" does not exist at all. Also the names were changed, so that "SearchParametersIVFPQ" became "IVFPQSearchParameters" and the old names are no longer findable. Moreover, the search method does not even accept SearchParameters, although according to the wiki it should.
I tried to find a solution with Visual Studio's Intellisense. But this was unsuccessful...
So the documentation seems to be outdated... Does anyone know how this works today?
This was driving me mad too! I've put together a small working example below. TLDR: the selector needs to be an argument to faiss.SearchParametersIVF
Let's start by creating a simple index and searching the whole thing:
import numpy as np
import faiss
# Set random seed for reproducibility
np.random.seed(0)
# Create a set of 5 small binary vectors
vectors = np.array([[1, 0, 1],
[0, 1, 0],
[1, 1, 0],
[0, 0, 1],
[1, 0, 0]])
# Initialize an index with the Hamming distance measure
index = faiss.IndexFlatL2(vectors.shape[1])
# Add vectors to the index
index.add(vectors)
# Perform a similarity search
query_vector = np.array([[1, 1, 0]], dtype=np.uint8)
k = 3 # Number of nearest neighbors to retrieve
distances, indices = index.search(query_vector, k)
print(indices)
The output when you run this is [[2 1 4]]
. So the colsest vectors are at those indecies. Now let's filter out index 4 and see what happens. This is done by creating the selector and then adding it to faiss.SearchParametersIVF
.
filter_ids = [0, 1, 2, 3]
id_selector = faiss.IDSelectorArray(filter_ids)
filtered_distances, filtered_indices = index.search(query_vector, k, params=faiss.SearchParametersIVF(sel=id_selector))
print(filtered_indices)
This outputs [[2 1 0]]
So we removed the 4th index from the search!