langchainpy-langchainchromadb

Get all documents from ChromaDb using Python and langchain


I'm using langchain to process a whole bunch of documents which are in an Mongo database.

I can load all documents fine into the chromadb vector storage using langchain. Nothing fancy being done here. This is my code:


from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings, persist_directory='db')
db.persist()

Now, after storing the data, I want to get a list of all the documents and embeddings WITH id's.

This is so I can store them back into MongoDb.

I also want to put them through Bertopic to get the topic categories.

Question 1 is: how do I get all documents I've just stored in the Chroma database? I want the documents, and all the metadata.

Many thanks for your help!


Solution

  • Looking at the source code (https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py)

    You can just call below

    db.get()
    

    and you will get a json output with the id's, embeddings and docs data.