In OpenSearch I implemented a custom script_score
using Painless scripting language. When I only use query.bool.should
it is called once per document and the returned _score
is correct
However, when I combine both query.bool.should
and query.bool.must
in my query, the script_score
is called twice or three times per document, and the resulting score is the sum of all the calls. This causes the score to be higher than intended.
Why does this happen? and how can I ensure it is called only once per document when using both should
and must
in query
? Or at least prevent OpenSearch from summing the results of all calls per document and only return the result of one of these calls?
E.g. see below query (which I simplified it here so the example is easy to understand) you'll see the script_source
source is return Integer.parseInt(doc['_id'].value);
however because I used both should
and must
in my query the calculated _score
for document 6148
is 18444
(i.e. 6148 * 3
) instead of 6148
{
"from": 0,
"size": 10,
"stored_fields": "_none_",
"docvalue_fields": [
"_id",
"_score"
],
"sort": [
{
"_score": {
"order": "asc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": { "category_ids": "2" }
},
{
"terms": { "visibility": ["3", "4"] }
}
],
"should": [
{
"ids": {
"values": [
"6148"
]
}
}
],
"minimum_should_match": 1
}
},
"script_score": {
"script": {
"lang": "painless",
"source": "return Integer.parseInt(doc['_id'].value);"
}
}
}
}
}
Answering my own question to help others who might face the same issue in the future. While I still don't understand why in some cases the script_score
gets called more than once, I was able to fix the scoring.
To prevent the scoring from being summed or multiplied I added boost_mode: replace
parameter like below:
{
"query": {
"function_score": {
"query": { ... },
"boost_mode": "replace", // Adding this fixed the issue for me
}
}
I found this solution by looking at OpenSearch docs https://opensearch.org/docs/latest/query-dsl/compound/function-score
You can specify how the score computed using all functions[1] is combined with the query score in the
boost_mode
parameter, which takes one of the following values:
multiply
: (Default) Multiply the query score by the function score.replace
: Ignore the query score and use the function score.sum
: Add the query score and the function score.avg
: Average the query score and the function score.max
: Take the greater of the query score and the function score.min
: Take the lesser of the query score and the function score.
[1] Note that the boost_mode
works in both scenarios: whether you have a single function (as in my case) or multiple functions (also in case of multiple functions you might want to look at score_mode
parameter too from the same docs page that I provided its link above)