c++facebookknnfaiss

C++ Faiss - How to search in subsets


according to the faiss wiki page (link), you should be able to use SearchParameters to selectively include or exclude ids in a search. Now the information there is a bit strange, because the field "sel" does not exist at all. Also the names were changed, so that "SearchParametersIVFPQ" became "IVFPQSearchParameters" and the old names are no longer findable. Moreover, the search method does not even accept SearchParameters, although according to the wiki it should.

I tried to find a solution with Visual Studio's Intellisense. But this was unsuccessful...

So the documentation seems to be outdated... Does anyone know how this works today?


Solution

  • This was driving me mad too! I've put together a small working example below. TLDR: the selector needs to be an argument to faiss.SearchParametersIVF

    Let's start by creating a simple index and searching the whole thing:

    import numpy as np
    import faiss
    
    # Set random seed for reproducibility
    np.random.seed(0)
    
    # Create a set of 5 small binary vectors
    vectors = np.array([[1, 0, 1],
                        [0, 1, 0],
                        [1, 1, 0],
                        [0, 0, 1],
                        [1, 0, 0]])
    
    # Initialize an index with the Hamming distance measure
    index = faiss.IndexFlatL2(vectors.shape[1])
    
    # Add vectors to the index
    index.add(vectors)
    
    # Perform a similarity search
    query_vector = np.array([[1, 1, 0]], dtype=np.uint8)
    k = 3  # Number of nearest neighbors to retrieve
    
    distances, indices = index.search(query_vector, k)
    print(indices)
    

    The output when you run this is [[2 1 4]]. So the colsest vectors are at those indecies. Now let's filter out index 4 and see what happens. This is done by creating the selector and then adding it to faiss.SearchParametersIVF.

    filter_ids = [0, 1, 2, 3]
    id_selector = faiss.IDSelectorArray(filter_ids)
    filtered_distances, filtered_indices = index.search(query_vector, k, params=faiss.SearchParametersIVF(sel=id_selector))
    print(filtered_indices)
    

    This outputs [[2 1 0]] So we removed the 4th index from the search!