elasticsearchopensearchelasticsearch-painless

OpenSearch prevent script_score from running more than once


In OpenSearch I implemented a custom script_score using Painless scripting language. When I only use query.bool.should it is called once per document and the returned _score is correct

However, when I combine both query.bool.should and query.bool.must in my query, the script_score is called twice or three times per document, and the resulting score is the sum of all the calls. This causes the score to be higher than intended.

Why does this happen? and how can I ensure it is called only once per document when using both should and must in query? Or at least prevent OpenSearch from summing the results of all calls per document and only return the result of one of these calls?

E.g. see below query (which I simplified it here so the example is easy to understand) you'll see the script_source source is return Integer.parseInt(doc['_id'].value); however because I used both should and must in my query the calculated _score for document 6148 is 18444 (i.e. 6148 * 3) instead of 6148

{
  "from": 0,
  "size": 10,
  "stored_fields": "_none_",
  "docvalue_fields": [
    "_id",
    "_score"
  ],
  "sort": [
    {
      "_score": {
        "order": "asc"
      }
    }
  ],
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "term": { "category_ids": "2" }
            },
            {
              "terms": { "visibility": ["3", "4"] }
            }
          ],
          "should": [
            {
              "ids": {
                "values": [
                  "6148"
                ]
              }
            }
          ],
          "minimum_should_match": 1
        }
      },
      "script_score": {
        "script": {
          "lang": "painless",
          "source": "return Integer.parseInt(doc['_id'].value);"
        }
      }
    }
  }
}

Query and its results in OpenSearch Dashboard


Solution

  • Answering my own question to help others who might face the same issue in the future. While I still don't understand why in some cases the script_score gets called more than once, I was able to fix the scoring.

    To prevent the scoring from being summed or multiplied I added boost_mode: replace parameter like below:

    {
      "query": {
        "function_score": {
          "query": { ... },
          "boost_mode": "replace", // Adding this fixed the issue for me
        }
    }
    

    I found this solution by looking at OpenSearch docs https://opensearch.org/docs/latest/query-dsl/compound/function-score

    You can specify how the score computed using all functions[1] is combined with the query score in the boost_mode parameter, which takes one of the following values:

    • multiply: (Default) Multiply the query score by the function score.
    • replace: Ignore the query score and use the function score.
    • sum: Add the query score and the function score.
    • avg: Average the query score and the function score.
    • max: Take the greater of the query score and the function score.
    • min: Take the lesser of the query score and the function score.

    [1] Note that the boost_mode works in both scenarios: whether you have a single function (as in my case) or multiple functions (also in case of multiple functions you might want to look at score_mode parameter too from the same docs page that I provided its link above)