elasticsearchelasticsearch-dslelasticsearch-2.0elasticsearch-6.8elasticsearch-2.4

Query string with default_operator as AND works differently in ES 2.4 and ES 6.8


The query_string seems to be returning the documents that match all the terms across fields specified in the fields parameter in ES 2.4 whereas in ES 6.8 documents that match all the terms per field are returned.

Steps to reproduce: Insert the following documents into both ES 2.4 and ES 6.8 clusters:

PUT yields_test/il4/1
{
    "security_name":"high term1",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/2
{
    "security_name":"high term2",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/3
{
    "security_name":"low term2",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/4
{
    "security_name":"low term1",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/5
{
    "security_name":"high yield",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/6
{
    "security_name":"high yields",
    "doc_type":"YIELDS"
}

PUT yields_test/il4/7
{
    "security_name":"high term3",
    "doc_type":"YIELD"
}

And try to search with the following query_string query in both of them:

GET yields_test/_search
{
    "query": {
        "query_string": {
           "fields": [
              "security_name",
                  "doc_type"
           ], 
           "query": "high yield",
           "default_operator": "AND"
        }
    }
}

ES 2.4 returns the following documents:

[
         {
            "_index": "yields_test",
            "_type": "il4",
            "_id": "7",
            "_score": 0.5098911,
            "_source": {
               "security_name": "high term3",
               "doc_type": "YIELD"
            }
         },
         {
            "_index": "yields_test",
            "_type": "il4",
            "_id": "5",
            "_score": 0.08322528,
            "_source": {
               "security_name": "high yield",
               "doc_type": "YIELDS"
            }
         }
      ]

whereas ES 6.8 returns the following:

[
         {
            "_index": "yields_test",
            "_type": "il4",
            "_id": "5",
            "_score": 0.5753642,
            "_source": {
               "security_name": "high yield",
               "doc_type": "YIELDS"
            }
         }
]

I was able to find what changed between the versions by using the profile field in the request body. The Lucene query which is created is different for both.

Lucene query generated by ES 2.4: +(security_name:high | doc_type:high) +(security_name:yield | doc_type:yield)

By ES 6.8: ((+security_name:high +security_name:yield) | (+doc_type:high +doc_type:yield))

My other question about how to return the same documents in ES 6.8 still remains.


Solution

  • You could use a bool query:

    {
      "query" : {
        "bool" : { 
          "must" : [
            {"query_string": {
                 "fields": [ "security_name","doc_type" ], 
                 "query": "high",           
                }},
            { "query_string": {
                 "fields": [ "security_name","doc_type" ], 
                 "query": "yeild", }}
          ]
        }
      }
    }