[SOLVED] Is PyMilvus client thread-safe & fork-safe?

Is PyMilvus client thread-safe & fork-safe?

I'm thinking about using Milvus vector storage in my Flask based project and looking at the PyMilvus (Python SDK) documentation. I haven't found any information yet about:

Is PyMilvus thread-safe?
Is PyMilvus fork-safe?
How does connection pooling work in the SDK?

Could you help me to sort it out?

The official documentation doesn't contain too much information.

Solution

Currently PyMilvus version(v2.3.x) doesn't provide a thread pool or connection pool. Basically, PyMilvus has a global object "connections" to maintain client-to-server connections.

User calls connections.connect() To create a connection:

from pymilvus import (
    connections,
)
connections.connect(host=HOST, port=PORT, alias="xxx")

This method has a parameter "alias", it is the name of the connection. The "connections" object internally maintains a map of name-to-connection. If you didn't provide the "alias", it will use "default" as the name of the connection.

When you declare a collection, there is a parameter "using" to specify a connection name. If you didn't provide the "using", it will use "default" connection. All the interfaces of this Collection will work via this connection.

collection = Collection(name=collection_name, using="xxx")
collection.insert()
collection.search()
......

The connection object is thread-safe, which means you can call the collection's interfaces from different threads. But the connection object cannot be shared by multiple sub-processes. So, if you fork a sub-process, you should ensure each subprocess creates its own connection by connections.connect().