pythonmetadatalangchainchromadbvector-database

How to filter documents based on a list of metadata in LangChain's Chroma VectorStore?


I'm working with LangChain's Chroma VectorStore, and I'm trying to filter documents based on a list of document names.

I have a list of document names as follows:

lst = ['doc1', 'doc2', 'doc3']

I also have doc_name metadata in my VectorStore. Currently, I’m using the following code to retrieve documents:

base_retriever = chroma_db.as_retriever(search_kwargs={'k': 10})

However, I’m not sure how to modify this code to filter documents based on my list of document names. Could anyone guide me on how to achieve this? Any help would be greatly appreciated!


Solution

  • # Define your list of document names
    lst = ['doc1', 'doc2', 'doc3']
    
    # Create a filter dictionary to filter by document names
    filter_dict = {"name": {"$in": lst}}
    
    # Modify the as_retriever method to include the filter in search_kwargs
    base_retriever = chroma_db.as_retriever(search_kwargs={'k': 10, 'filter': filter_dict})
    
    # Now you can use the retriever to search with the filter applied
    query = "your search query"
    results = base_retriever.invoke(query)
    for result in results:
        print(result)