elasticsearchelasticsearch-queryelasticsearch-analyzerselasticsearch-indices

Elasticsearch - Adding case-insensitive exact match search to custom analyzer


I have an Index as follows:

{
  "entities": {
    "mappings": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "stop_delimiter_stemmer_analyzer"
        }
      }
    }
  }
}

And following is stop_delimiter_stemmer_analyzer (my custom analyzer):

"analysis": {
  "analyzer": {
    "stop_delimiter_stemmer_analyzer": {
      "tokenizer": "whitespace",
      "filter": [
        "word_delimiter_graph",
        "german_stemmer",
        "english_stemmer",
        "french_stemmer",
        "italian_stemmer",
        "multi_language_stopwords"
      ],
    }
  },
  "filter": {
    "german_stemmer": {
      "type": "stemmer",
      "name": "light_german"
    },
    "english_stemmer": {
      "type": "stemmer",
      "name": "english"
    },
    "french_stemmer": {
      "type": "stemmer",
      "name": "light_french"
    },
    "italian_stemmer": {
      "type": "stemmer",
      "name": "light_italian"
    },
    "multi_language_stopwords": {
      "type": "stop",
      "stopwords": [
        "_english_",
        "_french_",
        "_italian_",
        "_dutch_"
      ]
    }
  }
}

If I use the match query to search Preuve à futur, Elasticsearch finds it as the first result.
But if I search it as preuve à futur, It finds it in so much lower in ranking.

I need to add the case-insensitive exact match to my search in order to find exact matches (case-insensitive or case-sensitive) in the first results.
How can I do that?
thanks.

Note: I use Elasticsearch 7.16


Solution

  • Just use the lowercase token filter as the first item in your analyzer definition's filter list, this way all tokens will be indexed lowercase and searching time also as match query uses the same analyzer, search string will also be tokenised lowercase and you will be able to get result in a case insensitive manner.

    "filter": [
      "lowercase",
      "word_delimiter_graph",
      "german_stemmer",
      "english_stemmer",
      "french_stemmer",
      ...
    ]