pythondockerstreamlitqdrant

how to configure Qdrant data persistence and reload


I'm trying to build an app with streamlit that uses Qdrant python client.

to run the qdrant, im just using:

docker run -p 6333:6333 qdrant/qdrant

I have wrapped the client in something like this:

class Vector_DB:
    def __init__(self) -> None:
        self.collection_name = "__TEST__"
        self.client = QdrantClient("localhost", port=6333,path = "/home/Desktop/qdrant/qdrant.db")

but i'm getting this error:

Storage folder /home/Desktop/qdrant/qdrant.db is already accessed by another instance of Qdrant client. If you require concurrent access, use Qdrant server instead.

I suspect that streamlit is creating multiple instances of this class, but, if i try to load the db from one snapshot, like:

    class Vector_DB:
        def __init__(self) -> None:
             self.client = QdrantClient("localhost", port=6333)
             self.client.recover_snapshot(collection_name = "__TEST__",location = "http://localhost:6333/collections/__TEST__/snapshots/__TEST__-8742423504815750-2023-10-30-12-04-14.snapshot")

it works. Seems like i'm missing something important on how to configure it. What is the properly way of setting Qdrant, to store some embeddings, turn off the machine, and reload it?


Solution

  • You mention using the Qdrant server, to which you'd like to connect with the Python client.

    There are two problems in your above question, let me go over both of them:

    1. Persist data in Qdrant server:
    A Qdrant server stores its data inside the Docker container. Docker containers are immutable however, which means that they don't hold data across restarts. To persist data you must specify a mount. Qdrant will then persist data on the mount instead of in the immutable container. You could configure a mount using the -v flag like this1:

    docker run -p 6333:6333 \
        -v $(pwd)/qdrant_storage:/qdrant/storage:z \
        qdrant/qdrant
    

    Data is automatically persisted and reloaded when you stop or restart the Qdrant container. You don't have to take extra measures for this.

    2. Qdrant server versus local mode:
    Qdrant supports two operating modes. The Qdrant server and local mode. You're using the Qdrant server through Docker. The Python client also supports local mode which is an in-memory implementation intended for testing.

    To use a Qdrant server you must specify its location (URL)2. You've already specified "localhost", perfect if hosting the Qdrant server on your local machine.

    To use local mode you can either specify ":memory:" or provide a path to persist data3.

    Right now you've specified parameters for both. Instead you must stick with one. You can update your client initialization to this:

    class Vector_DB:
        def __init__(self) -> None:
            self.collection_name = "__TEST__"
            self.client = QdrantClient("localhost", port=6333)