langchainchromadb

Impossible de find a Chroma Vector DB collection created with Langchain


I am using this code to create a Chroma Vector DB (I have skipped non essential part for the code)

    import os
    from langchain_community.vectorstores import Chroma
    from langchain_openai import OpenAIEmbeddings    

    current_dir = os.path.dirname(os.path.abspath(__file__))
    file_path = os.path.join(current_dir, "books", "ulysses.txt")
    persistent_directory = os.path.join(current_dir, "db", "chroma_db_Ulysse")
    
    db = Chroma.from_documents(
            docs, embeddings, persist_directory=persistent_directory,collection_name="CollectionUlysse")

It works fine, except when I tried to access this database with Chroma, it can't find the collection :

    import chromadb
    import os
    from chromadb.config import Settings
    
    
    current_dir = os.path.dirname(os.path.abspath(__file__))
    persistent_directory = os.path.join(current_dir, "db", "chroma_db_Ulysse_is_back")
    
    print(f"Chemin du répertoire persistant : {persistent_directory}")
    client = chromadb.Client(Settings(persist_directory=persistent_directory))
    
    collections = client.list_collections()
    collection_names = [col.name for col in collections]
    print("Available collections :", collection_names)
Available collections : []

Can it be differences between chromadb lib and Chroma lib frol langchain ?


Solution

  • can you try using the PersistentClient instead of Client with config. The Client is meant for programatic configuration via env vars or settings. In recent versions new settings were introduces which may make supplying persistent_directory not enough to create a persistent client.

    I have created a persistent dir with Langchain🦜🔗 ran your code and arrived at the same conclusion. After inspecting the sqlite3 file I can confirm that the collection is indeed created and using PersistentClient solved the problem:

    import chromadb
    import os
    from chromadb.config import Settings
    
    
    client = chromadb.PersistentClient("db")
    
    collections = client.list_collections()
    collection_names = [col.name for col in collections]
    print("Available collections :", collection_names)