I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.
As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it
client = chromadb.PersistentClient(path="chromaDB")
collection = client.get_or_create_collection(name="test",
embedding_function=openai_ef,
metadata={"hnsw:space": "cosine"})
collection.add(
documents=documents,
ids=ids,
metadatas=metadata
)
This was the result,
collection.get(include=['embeddings','metadatas'])
Output:
{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}
Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
But I don't see anything loaded. db.get()
results in this,
db.get(include=['metadatas'])
Output:
{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}
Please help. Need to load metadata to the files being loaded.
Found the answer myself.
I haven't mentioned the collection name while loading.
Instead of doing this,
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
Do this
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')
In my case, the collection name is 'test'.