I'm trying an example using the same settings as in the documentation when creating an index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"char_filter": [
"emoticons"
],
"tokenizer": "punctuation",
"filter": [
"lowercase",
"english_stop"
]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "[ .,!?]"
}
},
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}
then I save a data to the index
POST /my-index-000003/_doc/1
{
"content": "I'm feeling :) today, but the weather is quite gloomy :("
}
However, when I search for :) or happy, I can't find a match. Why?
At indexing time :)
gets replaced with _happy_
and :(
with _sad_
. So you cannot search for :)
or :(
anymore.
If you don't want your emoticons to be replaced, you need to use a synonyms token filter instead of a character filter.
If you search for happy
that will not find _happy_
, but if you search for _happy_
that will work, I was able to reproduce and that worked with the following query:
POST test/_search
{
"query": {
"match": {
"content": "_happy_"
}
}
}
Note that this will only work if your content
field is configured with the my_custom_analyzer
analyzer
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}