databaseartificial-intelligencevector-databasemilvus

Filtering results by gender: Adding a boolean field schema does not enhance search speed


I have created a collection with the following specifications:

Now, I want to add an additional field schema that also contains a boolean value indicating the gender of each embedding vector, allowing me to restrict queries based on gender. For instance, I aim to retrieve the 50 nearest neighbors that are male. To achieve this, I will generate gender data with an equal probability of 50%, resulting in half of the collection being male and the other half female. I conducted benchmarks under this scenario, and the findings are outlined below. As illustrated in the plot, filtering results by gender did not confer any advantages; for example, in one case, the filtering was only 1.06 times faster than non-filtered queries.


Solution

  • Adding index might not be that helpful and only improves less than 50% performance in your case (low cardinality field), and most time will be spent on HNSW. In fact, boolean filtering itself is super fast( 1ms<) and doesn't really need any index.