python-3.xlangchainpy-langchainqdrantqdrantclient

Initialise qdrant for langchain


I want to experiment with adding my existing qdrant vector database to langchain for a chatGPT project. However, I cannot seem to find a way to initialise the Qdrant object without providing docs and embeddings, which seems weird to me, as I should be able to simply provide my database url since the docs and embeddings already exist in the database, like when I am interacting via the qdrant python client:

QdrantClient(host=host, port=port)

In the official langchain documentation I can only find examples where I have to provide the data when loading the object, like so:

url = "<---qdrant url here --->"
qdrant = Qdrant.from_documents(
     docs,
     embeddings,
     url,
     collection_name="my_documents",
)

Their documentation also states that:

Both Qdrant.from_texts and Qdrant.from_documents methods are great to start using Qdrant with Langchain. In the previous versions the collection was recreated every time you called any of them. That behaviour has changed. Currently, the collection is going to be reused if it already exists. Setting force_recreate to True allows to remove the old collection and start from scratch.

Which I find strange as the collection is being reused (as I want) but i still have to provide docs and embeddings.

I have also checked the qdrants official documentation on the matter, and they provide a half solution where I "only" have to provide the embeddings:

import qdrant_client

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

client = qdrant_client.QdrantClient(
    "<qdrant-url>",
    api_key="<qdrant-api-key>", # For Qdrant Cloud, None for local instance
)

doc_store = Qdrant(
    client=client, collection_name="texts", 
    embeddings=embeddings,
)

If anyone has a solution for this, I would be happy to receive some help.


Solution

  • So turns out I misunderstood the documentation.

    the embeddings object should not be a list of embeddings but rather a model such as s-BERT or openAI's embedding model.

    You can provide a qdrant_client from qdrants official client (qdrant-client) which allows you to define a langchain Qdrant client without providing the documents each time.

    Here is an example from Qdrants documentation on how it should be done:

    import qdrant_client
    from langchain.vectorstores import Qdrant
    from langchain.embeddings import HuggingFaceEmbeddings
    
    # The embedding model that will be used by the collection
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"
    )
    
    # The qdrant client from the OFFICIAL qdrant python lib and NOT langchain
    client = qdrant_client.QdrantClient(
        "<qdrant-url>",
        api_key="<qdrant-api-key>", # For Qdrant Cloud, None for local instance
    )
    
    # The qdrant client from langchain
    doc_store = Qdrant(
        client=client, collection_name="texts", 
        embeddings=embeddings,
    )