pythonelasticsearchsimilarityvector-space

Indexing documents only with numeric fields in elasticsearch


I am trying to store objects in elasticsearch which are represented by only numeric fields. IN my case each object has 300 float fields and 1 id fields. I have put the id field as not_analyzed. I am able to store the documents in ES.

 "_index": "smart_content5",
    "_type": "doc2vec",
    "_id": "AVtAGeaZjLL5cvd8z9y7",
    "_score": 1,
    "_source": {
      "feature_227": 0.0856793,
      "feature_5": -0.115823,
      "feature_119": -0.0379987,
      "feature_145": 0.17952,
      "feature_29": 0.0444945,

but now I want to run a query represented with the same 300 fields but different numerical values (of course). Now I want to find the document whose 300 fields are "most similar" to this query fields. So it is something like doing cosine similarity but I am trying to use ES for doing this so that it is fast.

(1) First of all, is it even possible to do what I am doing??

(2) Second, I have explored the function_score feature of ES and tried using that but it returns that the maximum match score is 0.0!!

Any comments on what should I use and what I might be doing wrong in [2].


Solution

  • I think you still need function_score but like this (it worked for me):

    {
      "query": {
        "function_score": {
          "query": {},
          "functions": [
            {
              "gauss": {
                "feature_227": {
                  "origin": "0",
                  "scale": "0.5"
                }
              }
            },
            {
              "gauss": {
                "feature_5": {
                  "origin": "0",
                  "scale": "0.5"
                }
              }
            },
            {
              "gauss": {
                "feature_119": {
                  "origin": "0",
                  "scale": "0.5"
                }
              }
            },
            {
              "gauss": {
                "feature_145": {
                  "origin": "0",
                  "scale": "0.5"
                }
              }
            },
            {
              "gauss": {
                "feature_29": {
                  "origin": "0",
                  "scale": "0.5"
                }
              }
            }
          ],
          "score_mode": "sum"
        }
      }
    }