When I try to pass a Chroma Client to Langchain that uses OpenAIEmbeddings
, I get a ValueError:
ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
How do I resolve this error?
The error seems to be related to the fact that langchain's embedding function implementation doesn't meet the new requirements introduced by Chroma's latest update because the issue showed up after upgrading Chroma.
My code:
import chromadb
from langchain_openai import OpenAIEmbeddings
client = chromadb.PersistentClient()
collection = client.get_or_create_collection(
name='chroma',
embedding_function=OpenAIEmbeddings()
)
I have langchain==0.1.1, langchain-openai==0.0.3 and chromadb==0.4.22. Looking into github issues, it seems downgrading chromadb to 0.4.15 solves the issue but since these libraries will upgrade even more in the coming months, I don't want to downgrade chroma but find a solution that works in the current version.
Since version 0.4.16(?), Chroma requires an embedding model that defines a __call__()
method that returns list of embeddings. It says as much in the migrations link shown in the error.
Given that we need a method that returns a list of embeddings and it's already defined in OpenAIEmbeddings
(embed_documents()
), the easiest solution I found was to create a custom class that inherits from OpenAIEmbeddings
wherein a __call__
method that triggers a call to OpenAIEmbeddings.embed_documents
is defined.
A small note: Unless you stored your OpenAI API Key in your .env file, you'll probably need to pass it as openai_api_key
parameter.
import chromadb
from langchain_openai import OpenAIEmbeddings
class CustomOpenAIEmbeddings(OpenAIEmbeddings):
def __init__(self, openai_api_key, *args, **kwargs):
super().__init__(openai_api_key=openai_api_key, *args, **kwargs)
def _embed_documents(self, texts):
return super().embed_documents(texts) # <--- use OpenAIEmbedding's embedding function
def __call__(self, input):
return self._embed_documents(input) # <--- get the embeddings
client = chromadb.PersistentClient()
collection = client.get_or_create_collection(
name='chroma',
embedding_function=CustomOpenAIEmbeddings(
openai_api_key="your very secret OpenAI api key"
) # <-- pass the new object instead of OpenAIEmbeddings()
)
Using OpenAI's Embedding object also works too (which can be accessed via self.client
). Basically we can define CustomOpenAIEmbeddings
like below by invoking the Embedding.create()
method in a loop like in this example use case.
class CustomOpenAIEmbeddings(OpenAIEmbeddings):
def __init__(self, openai_api_key, *args, **kwargs):
super().__init__(openai_api_key=openai_api_key, *args, **kwargs)
def _embed_documents(self, texts):
embeddings = [
self.client.create(input=text, model="text-embedding-ada-002").data[0].embedding
for text in texts
]
return embeddings
def __call__(self, input):
return self._embed_documents(input)