I'm trying to build a RAG using the Chroma database, but when I try to create it I have the following error : AttributeError: 'SentenceTransformer' object has no attribute 'embed_documents'. I saw that you can somehow fix it by modifying the Chroma library directly, but I don't have the rights for it on my environment. If someone has a piece of an advice, be pleased.
The ultimate goal is to use the index as a query engine for a chatbot. This is what I tried
Code:
#We load the chunks of texts and declare which column is to be embedded
chunks = DataFrameLoader(final_df_for_chroma_injection,
page_content_column='TEXT').load()
#create the open-source embedding function
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2')
#-Load the persist directory on which are stored the previous embeddings
#-And add the new ones from chunks/embeddings
index = Chroma.from_documents(chunks,
embedding_model,
persist_directory="./chroma_db")
This is the error I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[47], line 3
1 #-Load the persist directory on which are stored the previous embeddings
2 #-And add the new ones from chunks/embeddings
----> 3 index = Chroma.from_documents(chunks,
4 embedding_model,
5 persist_directory="./chroma_db")
File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:778, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
776 texts = [doc.page_content for doc in documents]
777 metadatas = [doc.metadata for doc in documents]
--> 778 return cls.from_texts(
779 texts=texts,
780 embedding=embedding,
781 metadatas=metadatas,
782 ids=ids,
783 collection_name=collection_name,
784 persist_directory=persist_directory,
785 client_settings=client_settings,
786 client=client,
787 collection_metadata=collection_metadata,
788 **kwargs,
789 )
File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:736, in Chroma.from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
728 from chromadb.utils.batch_utils import create_batches
730 for batch in create_batches(
731 api=chroma_collection._client,
732 ids=ids,
733 metadatas=metadatas,
734 documents=texts,
735 ):
--> 736 chroma_collection.add_texts(
737 texts=batch[3] if batch[3] else [],
738 metadatas=batch[2] if batch[2] else None,
739 ids=batch[0],
740 )
741 else:
742 chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:275, in Chroma.add_texts(self, texts, metadatas, ids, **kwargs)
273 texts = list(texts)
274 if self._embedding_function is not None:
--> 275 embeddings = self._embedding_function.embed_documents(texts)
276 if metadatas:
277 # fill metadatas with empty dicts if somebody
278 # did not specify metadata for all texts
279 length_diff = len(texts) - len(metadatas)
File /opt/anaconda3_envs/abeille_pytorch_p310/lib/python3.10/site-packages/torch/nn/modules/module.py:1688, in Module.__getattr__(self, name)
1686 if name in modules:
1687 return modules[name]
-> 1688 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'SentenceTransformer' object has no attribute 'embed_documents'```
Use SentenceTransformerEmbeddings
instead of SentenceTransformer
, or simply HuggingFaceEmbeddings
Reference > https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers