pythonvalueerrorlangchainchromadbopenaiembeddings

ValueError: Expected EmbeddingFunction.__call__ to have the following signature


When I try to pass a Chroma Client to Langchain that uses OpenAIEmbeddings, I get a ValueError:

ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])

How do I resolve this error?

The error seems to be related to the fact that langchain's embedding function implementation doesn't meet the new requirements introduced by Chroma's latest update because the issue showed up after upgrading Chroma.

My code:

import chromadb
from langchain_openai import OpenAIEmbeddings
client = chromadb.PersistentClient()
collection = client.get_or_create_collection(
    name='chroma', 
    embedding_function=OpenAIEmbeddings()
)

I have langchain==0.1.1, langchain-openai==0.0.3 and chromadb==0.4.22. Looking into github issues, it seems downgrading chromadb to 0.4.15 solves the issue but since these libraries will upgrade even more in the coming months, I don't want to downgrade chroma but find a solution that works in the current version.


Solution

  • Since version 0.4.16(?), Chroma requires an embedding model that defines a __call__() method that returns list of embeddings. It says as much in the migrations link shown in the error.

    Given that we need a method that returns a list of embeddings and it's already defined in OpenAIEmbeddings (embed_documents()), the easiest solution I found was to create a custom class that inherits from OpenAIEmbeddings wherein a __call__ method that triggers a call to OpenAIEmbeddings.embed_documents is defined.

    A small note: Unless you stored your OpenAI API Key in your .env file, you'll probably need to pass it as openai_api_key parameter.

    import chromadb
    from langchain_openai import OpenAIEmbeddings
    
    class CustomOpenAIEmbeddings(OpenAIEmbeddings):
    
        def __init__(self, openai_api_key, *args, **kwargs):
            super().__init__(openai_api_key=openai_api_key, *args, **kwargs)
            
        def _embed_documents(self, texts):
            return super().embed_documents(texts)  # <--- use OpenAIEmbedding's embedding function
    
        def __call__(self, input):
            return self._embed_documents(input)    # <--- get the embeddings
    
    
    client = chromadb.PersistentClient()
    collection = client.get_or_create_collection(
        name='chroma', 
        embedding_function=CustomOpenAIEmbeddings(
            openai_api_key="your very secret OpenAI api key"
        )         # <-- pass the new object instead of OpenAIEmbeddings()
    )
    

    Using OpenAI's Embedding object also works too (which can be accessed via self.client). Basically we can define CustomOpenAIEmbeddings like below by invoking the Embedding.create() method in a loop like in this example use case.

    class CustomOpenAIEmbeddings(OpenAIEmbeddings):
    
        def __init__(self, openai_api_key, *args, **kwargs):
            super().__init__(openai_api_key=openai_api_key, *args, **kwargs)
    
        def _embed_documents(self, texts):
            embeddings = [
                self.client.create(input=text, model="text-embedding-ada-002").data[0].embedding 
                for text in texts
            ]
            return embeddings
            
        def __call__(self, input):
            return self._embed_documents(input)