I have a fuzzy search analyzer in elastic search with following documents
PUT test_index
{
"settings": {
"index": {
"max_ngram_diff": 40
},
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"autocomplete"
]
},
"autocomplete_search": {
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
},
"filter": {
"autocomplete": {
"type": "ngram",
"min_gram": 2,
"max_gram": 40
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
PUT test_index/_doc/1
{ "title": "HRT 2018-BN18 N-SB" }
PUT test_index/_doc/2
{ "title": "GMC 2019-BN18 A-SB" }
How can i ignore the hyphen ('-') during my fuzzy search so that GMC 2019-BN18 A-SB , gmc 2019, gmc 2019-BN18 A-SB and GMC 2019-BN18 ASB yield the same document
I had tried to create another analyzer separately but i am not sure how can we apply multiple analyzer on the same field
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"- => "
]
}
}
}
}
You're on the right path, you just need to add that character filter to both analyzers to make sure the hyphens get removed at indexing and search time:
PUT test_index
{
"settings": {
"index": {
"max_ngram_diff": 40
},
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"- => "
]
}
},
"analyzer": {
"autocomplete": {
"char_filter": [
"my_char_filter"
],
"tokenizer": "whitespace",
"filter": [
"lowercase",
"autocomplete"
]
},
"autocomplete_search": {
"char_filter": [
"my_char_filter"
],
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
},
"filter": {
"autocomplete": {
"type": "ngram",
"min_gram": 2,
"max_gram": 40
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}