azureazure-cognitive-searchvector-databaseazure-aiazure-ai-search

Normalizing search scores on Azure AI Search


I have an index on AI Search that contains one term in English (e.g. "white wine", "grapes", "chocolate cake", ...) per document. I have a vector field. Indexing has run without problems for 100k documents.

My use case is to find the closest term to one entered by the user and give a score to the match (0-100%). When I run the following query on Search Explorer on Azure Portal for my index:

{
  "search": "Winery products",
  "count": true,
  "vectorQueries": [
    {
      "kind": "text",
      "text": "Winery products",
      "fields": "vectorTextEnglish"
    }
  ]
}

I get the right results. Please note top score is 0.031:

{
  "@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
  "@odata.count": 75,
  "@search.nextPageParameters": {
    "select": "chunk_id,Term,MyReference,parent_id",
    "count": true,
    "skip": 50,
    "vectorQueries": [
      {
        "kind": "text",
        "k": null,
        "oversampling": null,
        "fields": "vectorTextEnglish",
        "vector": [],
        "text": "Winery products",
        "url": null,
        "base64Image": null,
        "exhaustive": null,
        "weight": null,
        "filterOverride": null,
        "threshold": null
      }
    ]
  },
  "value": [
    {
      "@search.score": 0.0317540317773819,
      "chunk_id": "xxxx",
      "Term": "Alcoholic wines",
      "MyReference": "00123",
      "parent_id": "yyyyy"
    },
    {
      "@search.score": 0.03159204125404358,
      ...
    },

However, if I ask a random string asdfjiwefowfwe I get a very similar score 0.030.

{
  "@odata.context": "https://me.search.windows.net/indexes('myindex')/$metadata#docs(*)",
  "@odata.count": 93,
  "@search.nextPageParameters": {
    "select": "chunk_id,Term,MyReference,parent_id",    
    "count": true,
    "skip": 50,
    "vectorQueries": [
      {
        "kind": "text",
        "k": null,
        "oversampling": null,
        "fields": "vectorTextEnglish",
        "vector": [],
        "text": "asdfjiwefowfwe",
        "url": null,
        "base64Image": null,
        "exhaustive": null,
        "weight": null,
        "filterOverride": null,
        "threshold": null
      }
    ]
  },
  "value": [
    {
      "@search.score": 0.03083491325378418,
      "chunk_id": "xxxxxx",
      "Term": "Ash",
      "MyReference": "00422",
      "parent_id": "yyyyy"
    },
    {
      "@search.score": 0.029877368360757828,
      ...
    },

I would like to normalize the score of the match from 0-100, but I don't understand how does a random string get the same score as a good match. Anyone can help me understand and guide me how to give a higher score if the match is good and 0 for random strings?

I tried setting some thresholds, but since scores are so close to each other, it is impossible. I tried with semantic ranking but it is even more confusing, these random strings get 1.8 reranking score while a perfect match is perhaps 2.4.


Solution

  • Use the parameter "debug": "all" in your request. Then you will get in the response a new property like "vectorSimilarity": "0.998" that goes from 0 to 1. In most cases you can then ignore keyword score, since vector search is very accurate. Semantic ranking is an overkill for most use cases.