chromadb

On a ChromaDB text query, is there any way to retrieve the query_text embeddings?


On ChromaDB query.

results = collection.query(
    query_texts=["AUSSIE SHAMPOO MIRACULOUSLY SMOOTH 180 ML x 1"],
    n_results=3,
    include=['documents','distances','embeddings']

I am able to retrieve data from the vector database, but I am interested in obtaining the embeddings of the query_texts ("AUSSIE SHAMPOO MIRACULOUSLY SMOOTH 180 ML x 1") because I plan to add them to the collection (vector database) after completing some processing. Is there any way to do that?

I know I can simply run my embedding function on the query_text, but since Chroma DB query already embed it. It would be more efficient to simply retrieve that.


Solution

  • You can create your embedding function explicitly (instead of relying on the default), e.g. using OpenAI:

    from chromadb.utils import embedding_functions
    openai_ef = embedding_functions.OpenAIEmbeddingFunction(
        api_key=openai_api_key, model_name="text-embedding-ada-002"
    )
    

    or sticking to the default:

    default_ef = embedding_functions.DefaultEmbeddingFunction()
    

    You'd then typically pass that to the collection like this:

    collection = chroma_client.get_collection(
        name="my_collection", embedding_function=openai_ef
    )
    

    and use your collection normally.

    However, to answer your question, you can now embed your query like this: embedding = openai_ef(["AUSSIE SHAMPOO MIRACULOUSLY SMOOTH 180 ML x 1"])

    and pass that embedding to the collection to find similar documents: results = collection.query(query_vector)

    See: https://docs.trychroma.com/embeddings