elasticsearchelasticsearch-analyzers

Elasticsearch analyzer settings and matching data


I'm trying an example using the same settings as in the documentation when creating an index

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { 
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": { 
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": { 
          "type": "mapping",
          "mappings": [
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": { 
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

then I save a data to the index

POST /my-index-000003/_doc/1
{
  "content": "I'm feeling :) today, but the weather is quite gloomy :("
}

However, when I search for :) or happy, I can't find a match. Why?


Solution

  • At indexing time :) gets replaced with _happy_ and :( with _sad_. So you cannot search for :) or :( anymore.

    If you don't want your emoticons to be replaced, you need to use a synonyms token filter instead of a character filter.

    If you search for happy that will not find _happy_, but if you search for _happy_ that will work, I was able to reproduce and that worked with the following query:

    POST test/_search
    {
      "query": {
        "match": {
          "content": "_happy_"
        }
      }
    }
    

    Note that this will only work if your content field is configured with the my_custom_analyzer analyzer

      "mappings": {
        "properties": {
          "content": {
            "type": "text",
            "analyzer": "my_custom_analyzer"
          }
        }
      }