Elastic KNN search num candidates vs shard size

I am looking into using Elastic KNN search feature and from what I see this is how we query ES for KNN search.

GET my-index/_knn_search
{
  "knn": {
    "field": "image_vector",
    "query_vector": [0.3, 0.1, 1.2],
    "k": 10,
    "num_candidates": 100
  },
  "_source": ["name", "file_type"]
}

Here num_candidates has a max limit of 10000 and from ES documents I see this - The number of nearest neighbor candidates to consider per shard. Cannot exceed 10,000. Elasticsearch collects num_candidates results from each shard, then merges them to find the top k results. Increasing num_candidates tends to improve the accuracy of the final k results.

The above is not very clear to me. Here are some questions:

How is the 10,000 candidates chosen?
If we have 1M vector documents, to search across all of these should we pick like 100 shards, so that each shard has max 10k documents? We need very pretty good recall on the retrieved results.
Their documents on picking shard strategy says too many smaller shards is bad and they have their own overhead. So how do we pick shard sizes when we have this limit of 10k candidates per shards for KNN?

I see similar questions here but there is no proper answer.

Solution

You need to distinguish between kNN (k Nearest Neighbors) and exact search.

With exact search (i.e. brute-force search by using a script_score query), if you have 1M vectors, your query vector will be compared against each of them and the results you'll get are the real 10 closest vectors to your query vector.

With kNN search, also called approximate nearest neighbors (ANN) it's a bit different, because your 1M vectors will be indexed in a dedicated structure depending on your vector search engine (Inverted File Index, KD trees, Hierarchical Navigable Small Worlds, etc). For Elasticsearch, which is based on Apache Lucene, vectors are indexed in a Hierarchical Navigable Small Worlds structure. At search time, the HNSW algorithm will try to figure out the k nearest neighbors to your query vector based on their closest distance, or highest similarity. It might find the real ones, or not, hence the approximate nature of these search algorithms. In order to decrease the odds of "or not", the idea is to visit a higher amount of vectors, and that's the role of num_candidates.

The idea is NOT to pick a value of num_candidates that is high enough to visit all vectors in your database, as that would boil down to make an exact search and it would make no sense to use an ANN algorithm for this, just run an exact search, pay the execution price and that's it.

The shard sizing document you are referring to does not pertain to kNN search. kNN search has its own tuning strategy that is different. As the HNSW graph needs to be built per segment and each segment needs to be searched, the ideal situation would be to have a single segment to search, i.e. one shard with one force-merged segment. Depending on your data volume and if you're constantly indexing new vectors, it might not be feasible. But you should optimize in that direction, i.e. less shards with less segments as much as possible.

Let's say that you manage to get your 1M vectors into a single shard with a single segment, there's no reason to have a high num_candidates, because the HNSW search algorithm has a pretty good recall rate and doesn't need to visit more than a certain amount of candidates (to be figured out depending on your use case, constraints, data volume, SLA, etc) in order to find the top k ones.

Update March 27th, 2024: It is worth noting that as of ES 8.13, k and num_candidates have become optional and their respective values are set to sensible default, i.e.:

k defaults to the value of size
num_candidates defaults to the minimum of 10000 and 1.5 x k

So by default:

size = k = 10
num_candidates = 15