elasticsearchelasticsearch-7

Case insensitive exact match in ElasticSearch


I need the ability to query an ElasticSearch index to see if there are any documents that already have a specific value for the field shown below:

"name" : {
      "type" : "text",
      "fields" : {
        "raw" : {
          "type" : "keyword"
        }
      }
 }

I was initially going to do this using a normalizer, but i'm hoping to avoid having to make changes to the index itself. I then found the match_phrase query which does almost exactly what I need. The problem is that it will also return partial matches as long as they start off the same. For example - if I'm searching for the value this is a test it will return results for the following values:

In my situation I can do another check in code once the data is returned to see if it is in fact a case insensitive exact match, but I'm relatively new to ElasticSearch and I'm wondering if there is any way I could structure my original match_phrase query in such a way that it would not return the examples I posted above?


Solution

  • For anyone that is interested I found a few different ways to do this, the first - do a match_phrase query and then have a script that checks the length:

    GET definitions/_search
    {
      "query": {
        "bool":{
          "must":{
            "match_phrase":{
              "name":{
                 "query":"Test Name"
              }
            }
          },
          "filter": [
            {
              "script": {
                "script": {
                  "source": "doc['name.raw'].value.length() == 9",
                  "lang": "painless"
                }
              }
            }
          ]
        }
      }
    }
    

    Then I figured that if I could check the length in the script, maybe I could just do a case insensitive comparison:

    GET definitions/_search
    {
      "query": {
        "bool": { 
          "filter": [
            {
              "script": {
                "script": {
                  "source": "doc['name.raw'].value.toLowerCase() == 'test name'",
                  "lang": "painless"
                }
              }
            }
          ]
        }
      }
    }
    

    So those are options. In my case I was concerned about performance so we just bit the bullet and created a normalizer that allows for case insensitive comparisons, so these weren't even used. But I figured I should throw this here since I wasn't able to find these answers anywhere else.