I use a whitespace
tokenizer, with searchkick_stemmer
"company" -> "compani"
"company+" -> "company+"
how can I make "company+" to be "compani+" or ["compani","+"]
I've tried with edge-gram, works fine, but it generated too many tokens. I'm considering if there is another approach, like conditional scripting or else.
I did this example but recommend read pattern token filter
POST _analyze
{
"tokenizer": "whitespace",
"filter": [
"stemmer"
],
"char_filter": {
"type": "pattern_replace",
"pattern": "[+]",
"replacement": " $0"
},
"text": [
"company+"
]
}
Tokens:
{
"tokens": [
{
"token": "compani",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "+",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 1
}
]
}