elasticsearchsearchtokenizeanalyzerstandardanalyzer

How to create and add values to a standard lowercase analyzer in elastic search


Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.

I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.

I create my analyzer and index as follows:

PUT /upper
{
"settings": {
    "index" : {
        "analysis" : {
             "analyzer": {
                    "rebuilt_standard": {
                      "tokenizer": "standard",
                      "filter": [
                        "standard"   
                      ]
                }
            }
        }
    }
},
"mappings": {
    "doc": {
        "properties": {
          "title": { 
            "type": "text",
            "analyzer": "rebuilt_standard"
          }
        }
    }
}

}

Then add two records to test like this...

POST /upper/doc
{
"text" : "TEST"
}

Add a second record...

POST /upper/doc
{
"text" : "test"
}

Using /upper/_settings gives the following:

{
  "upper": {
"settings": {
  "index": {
    "number_of_shards": "5",
    "provided_name": "upper",
    "creation_date": "1537788581060",
    "analysis": {
      "analyzer": {
        "rebuilt_standard": {
          "filter": [
            "standard"
          ],
          "tokenizer": "standard"
        }
      }
    },
    "number_of_replicas": "1",
    "uuid": "s4oDgdsFTxOwsdRuPAWEkg",
    "version": {
      "created": "6030299"
    }
  }
}
  }
}

But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.

Search like so...

GET /upper/_search
{
  "query": {
    "term": {
      "text": {
        "value": "test"
      }
  }
}
}

Thanks in advance!


Solution

  • first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)

    "properties": {
        "title": { 
            "type": "text",
            "analyzer": "rebuilt_standard"
        }
    }
    

    try

    "properties": {
        "text": { 
            "type": "text",
            "analyzer": "rebuilt_standard"
        }
    }
    

    and keep us posted ;)