word-embeddinglangchainchromadb

Chroma database embeddings = none when using get()


I am a brand new user of Chroma database (and the associate python libraries).

When I call get on a collection, embeddings is always none, even if embeddings are explicitly set/defined when adding documents to a collection (so it can't be an issue with generating the embeddings - I don't think).

For the following code (Python 3.10, chromadb 0.3.26), I expected to see a list of embeddings in the returned dictionary, but it is none.

import chromadb

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="my_collection")
collection.add(
    embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

print(collection.get())

Output:

{'ids': ['id1', 'id2'], 'embeddings': None, 'documents': ['This is a document', 'This is another document'], 'metadatas': [{'source': 'my_source'}, {'source': 'my_source'}]}

The same issue does not occur when using query instead of get:

print(collection.query(query_embeddings=[[1.2, 2.3, 4.4]], include=["embeddings"]))

Output:

{'ids': [['id1', 'id2']], 'embeddings': [[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]]], 'documents': None, 'metadatas': None, 'distances': None}

The same issue occurs when using langchain wrappers.

Any ideas, friends? :-)


Solution

  • According to the documentation https://docs.trychroma.com/usage-guide embeddings are excluded by default for performance:

    When using get or query you can use the include parameter to specify which data you want returned - any of embeddings, documents, metadatas, and for query, distances. By default, Chroma will return the documents, metadatas and in the case of query, the distances of the results. embeddings are excluded by default for performance and the ids are always returned.

    You can include the embeddings when using get as followed:

    print(collection.get(include=['embeddings', 'documents', 'metadatas']))