javaelasticsearchstemming

Search is not working in ElasticSearch for words ending with 's' 'y' 'e'


If search string and target object has any of following characters in the end then it doesn't work. s y e

In our application if user's name Granny, Smith. It was not searching any record for Granny as it ends with y. Same was case with s and e. i.e. James, Katie.


Solution

  • Root cause of the issue is stemmer. As per elastic search docs, Algorithmic stemmers apply a series of rules to each word to reduce it to its root form.

    For example, an algorithmic stemmer for English may remove the -s and -es suffixes from the end of plural words. You can refer to following sites for more detail: https://www.elastic.co/guide/en/elasticsearch/reference/current/stemming.html#algorithmic-stemmers https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#:~:text=Stemmer%20token%20filteredit,porter%20stemming%20algorithm%20for%20English.

    In your application mapping.json file you can remove if any stemmer configuration is already present.

    "settings": {
    "analysis": {
    **// Remove whole filter element below. line# 4 to 9
      "filter": {
        "custom_english_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      },**
      "normalizer": {
        "useLowercase": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "custom_tokenizer": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "analyzer": {
        "NGram_analyzer": {
          "tokenizer": "custom_tokenizer",
          "filter": [
            "lowercase",
            **// Remove stemmer from filter below. line#35
            "custom_english_stemmer",**
            "asciifolding"
          ]
        },
        "custom_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            **// Remove stemmer from filter below. line#44
            "custom_english_stemmer",**
            "asciifolding"
          ],
          "type": "custom"
        }
      }
    },
    "max_ngram_diff": "50"
    

    }

    If your application does not have any searchable description field which can have plural words then you can remove stemmer from your configuration and it should work fine.