langchainlarge-language-modelpy-langchainchromadb

langchain: how to use a custom embedding model locally?


I am trying to use a custom embedding model in Langchain with chromaDB. I can't seem to find a way to use the base embedding class without having to use some other provider (like OpenAIEmbeddings or HuggingFaceEmbeddings). Am I missing something?

On the Langchain page it says that the base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. so I figured there must be a way to create another class on top of this class and overwrite/implement those methods with our own methods. But how do I do that?

I tried to somehow use the base embeddings class but am unable to create a new embedding object/class on top of it.


Solution

  • You can create your own class and implement the methods such as embed_documents. If you strictly adhere to typing you can extend the Embeddings class (from langchain_core.embeddings.embeddings import Embeddings) and implement the abstract methods there. You can find the class implementation here.

    Below is a small working custom embedding class I used with semantic chunking.

    from sentence_transformers import SentenceTransformer
    from langchain_experimental.text_splitter import SemanticChunker
    from typing import List
    
    
    class MyEmbeddings:
        def __init__(self):
            self.model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
    
        def embed_documents(self, texts: List[str]) -> List[List[float]]:
            return [self.model.encode(t).tolist() for t in texts]
    
    
    embeddings = MyEmbeddings()
    
    splitter = SemanticChunker(embeddings)