I want to normalize my Elasticsearch query scores between (0,1) and I'm using the painless predefined saturation()
script for this, which takes the _score
as the argument for the value to be normalized and another argument for the pivot as in the
example:
{
"_source": ["id", "first_name", "last_name"],
"from": 0,
"size": 10000,
"query": {
"script_score": {
"query": {
// my query here
},
"script": {
"source": "saturation(_score, 10)"
}
}
}
}
When I use 10
or other number for pivot, it works as expected. But reading more in-depth docs such as Saturation on the Rank feature query, it says:
If a pivot value is not provided, Elasticsearch computes a default value equal to the approximate geometric mean of all rank feature values in the index. We recommend using this default value if you haven’t had the opportunity to train a good pivot value.
I haven't trained a good pivot; doing a few queries I noticed that the maximum _score
varies, like for some samples it was as little as 10, and other samples it was higher than 100. So choosing arbitrary pivot could be good for some cases and not for others...
How do I set my query above to use the default pivot value there? I tried a couple of things and got exceptions: saturation(_score)
, saturation(_score,)
, saturation(_score, null)
, etc
I suspect it's not possible for this use case. Here is the source on ScoreScriptUtils
:
public final class ScoreScriptUtils {
/****** STATIC FUNCTIONS that can be used by users for score calculations **/
public static double saturation(double value, double k) {
return value / (k + value);
}
//...
It seems like the default pivot can only be set on a numeric value using a rank_feature
or rank_features
field, which is not the same as the search _score