python-3.xlarge-language-modelpy-langchainchromadb

Can I not add metadata to documents loaded using Chroma.from_documents()


I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.

As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it

client = chromadb.PersistentClient(path="chromaDB")

collection = client.get_or_create_collection(name="test",
                                             embedding_function=openai_ef,
                                             metadata={"hnsw:space": "cosine"})
collection.add(
     documents=documents,
     ids=ids,
     metadatas=metadata
)

This was the result,

collection.get(include=['embeddings','metadatas'])

Output:

{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}

Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)

But I don't see anything loaded. db.get() results in this,

db.get(include=['metadatas'])

Output:

{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}

Please help. Need to load metadata to the files being loaded.


Solution

  • Found the answer myself.

    I haven't mentioned the collection name while loading.

    Instead of doing this,

    db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
    

    Do this

    db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')
    

    In my case, the collection name is 'test'.