I am trying to store objects in elasticsearch which are represented by only numeric fields. IN my case each object has 300 float fields and 1 id fields. I have put the id field as not_analyzed. I am able to store the documents in ES.
"_index": "smart_content5",
"_type": "doc2vec",
"_id": "AVtAGeaZjLL5cvd8z9y7",
"_score": 1,
"_source": {
"feature_227": 0.0856793,
"feature_5": -0.115823,
"feature_119": -0.0379987,
"feature_145": 0.17952,
"feature_29": 0.0444945,
but now I want to run a query represented with the same 300 fields but different numerical values (of course). Now I want to find the document whose 300 fields are "most similar" to this query fields. So it is something like doing cosine similarity but I am trying to use ES for doing this so that it is fast.
(1) First of all, is it even possible to do what I am doing??
(2) Second, I have explored the function_score feature of ES and tried using that but it returns that the maximum match score is 0.0!!
Any comments on what should I use and what I might be doing wrong in [2].
I think you still need function_score
but like this (it worked for me):
{
"query": {
"function_score": {
"query": {},
"functions": [
{
"gauss": {
"feature_227": {
"origin": "0",
"scale": "0.5"
}
}
},
{
"gauss": {
"feature_5": {
"origin": "0",
"scale": "0.5"
}
}
},
{
"gauss": {
"feature_119": {
"origin": "0",
"scale": "0.5"
}
}
},
{
"gauss": {
"feature_145": {
"origin": "0",
"scale": "0.5"
}
}
},
{
"gauss": {
"feature_29": {
"origin": "0",
"scale": "0.5"
}
}
}
],
"score_mode": "sum"
}
}
}