javascriptdatabasemongodbnon-relational-databasemongodb-atlas-search

Multiple documents having equal search score in MongoDB Atlas Search


Is there a way to boost score for exact match in Atlas search?

I'm having issues getting the right/best translation for 'hi' from English to French. After some debugging I discovered that the first three(3) documents returned from my aggregation has the same score of '2.362138271331787' each.

I'm expecting 'hi' to have a higher score since it has an exact match with the same search query, but 'it’s his' and 'his' seems to have the same score with 'hi'.

Here's my search query:

const searchOption= [
  {
    $search: {
      text: {
        query: 'hi',
        path: 'english',
      },
    },
  },
  { $project: {  _id: 0, french: 1, english: 1, score: { $meta: "searchScore" } } },
  { $limit: 5 },
];

const result = await Greetings.aggregate(searchOption, { cursor: { batchSize: 5 } }).toArray();

Here's are the documents returned. The list is ordered by search score:

[
  {
    "english": "it’s his",
    "french": "c'est le sien",
    "score": 2.362138271331787
  },
  {
    "english": "hi",
    "french": "salut",
    "score": 2.362138271331787
  },
  {
    "english": "his",
    "french": "le sien",
    "score": 2.362138271331787
  },
  {
    "english": "it’s his failure to arrange his",
    "french": "c'est son incapacité à organiser son",
    "score": 2.2482824325561523
  },
  {
    "english": "it’s his failure to arrange his time",
    "french": "c'est son incapacité à organiser son temps",
    "score": 2.0995540618896484
  }
]

Solution

  • This is a known limitation of Atlas Search and the solution is mentioned here: https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/#limitations

    The lucene.keyword analyzer on a string type is so helpful for exact matches in scenarios where score fidelity is essential.

    Basically, the path english should be defined in the index definition as both autocomplete and string, like:

    [
      {"type": "string"},
      {"type": "autocomplete"}
    ]
    

    The above assumes that you are not using a language analyzer for autocomplete or string, which is probably not ideal for string.

    Then, on the query side, you want a compound query where both options are should clauses. You should boost the text clause and not the autocomplete clause.