elasticsearchelastic-stackanalyzerstandardanalyzer

How to use Elasticsearch standard analyser without lower case


Im trying to create an analyser in elasticsearch using the pre-sets of "standard" analyser but with one change - no lower casing of words.

Ive tried chaining the whitespace and standard analyser like so:

PUT /standard_uppercase
{
"settings": {
"analysis": {
  "analyzer": {
    "rebuilt_standard": {
      "tokenizer": "standard",
      "filter": [
        "standard",
        "whitespace"       
      ]
    }
  }
}
}
}

But this does not give the required results. Is there a way to overwrite only the lowercase part of an analyser but retail all the existing features of the standard analyser?

Thanks in advance.


Solution

  • According the documentation

    Definition

    The standard analyzer consists of:

    Tokenizer

        Standard Tokenizer 
    

    Token Filters

        Standard Token Filter
        Lower Case Token Filter
        Stop Token Filter (disabled by default)
    

    So, you could achieve your purposes in that way:

    PUT /standard_uppercase
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "rebuilt_standard": {
              "tokenizer": "standard",
              "filter": [
                "standard"   
              ]
            }
          }
        }
      }
    }