search collections vector-database milvus

Issue with Optimizing Search Performance for Large Vector Collections in Milvus

I have a collection of around 250 million 768-dimensional vectors, configured with IVF_FLAT and strong consistency. I'm using Milvus 2.3.4 with 20 query nodes. I'm seeking advice on improving search performance, as searches are currently too slow: the 95th percentile is 1 second, with some searches taking even longer.

I have questions about two parameters:

When loading the collection using:

collection.load(replica_number=self.__replicas)

I set replica_number = 1. Does this setting correlate with the number of query nodes? How can I determine an optimal value for this parameter?

The nprobe parameter:

"index_type": self.index_type,
"metric_type": self.metric_type,
"params": {"nprobe": 64}

How can I adjust nprobe? Is there a formula that takes into account the number of vectors or other factors?

Additionally, regarding the nlist parameter:

index_params = {
    "metric_type": self.metric_type, 
    "index_type": self.index_type,
    "params": {'nlist': 2048}
}

How can I adjust nlist? Is there a specific formula for this?

Please note that the collection is constantly scaling, which needs to be considered in any adjustments.

Solution

To improve search performance for your collection, I recommend you optimize several parameters. When you load the collection with replica_number=1, it means you are loading one copy of the data across your query nodes. Increasing this number, for example, setting replica_number=2, can distribute the load and potentially improve query speed by allowing more nodes to handle the requests. For nlist, a recommended formula is to set it to around 4 * sqrt(n), where n is the total number of vectors. This helps balance the trade-off between index build time and search performance. The nprobe parameter, which you currently have set to 64, requires tuning to balance accuracy and performance; increasing nprobe generally increases accuracy at the cost of longer search times. Consider testing different nprobe values to find the optimal balance for your use case. Additionally, you might want to explore using the HNSW index, which often provides faster query times than IVF_FLAT. Adjusting your consistency settings from strong to bounded or eventually can reduce latency by approximately 200ms to 400ms.