[SOLVED] Elasticsearch stems words end with symbols/special character

Elasticsearch stems words end with symbols/special character

I use a whitespace tokenizer, with searchkick_stemmer

"company" -> "compani"

"company+" -> "company+"

how can I make "company+" to be "compani+" or ["compani","+"]

I've tried with edge-gram, works fine, but it generated too many tokens. I'm considering if there is another approach, like conditional scripting or else.

Solution

I did this example but recommend read pattern token filter

POST _analyze
{
  "tokenizer": "whitespace",
  "filter": [
    "stemmer"
  ],
  "char_filter": {
    "type": "pattern_replace",
    "pattern": "[+]",
    "replacement": " $0"
  },
  "text": [
    "company+"
  ]
}

Tokens:

{
  "tokens": [
    {
      "token": "compani",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "+",
      "start_offset": 7,
      "end_offset": 8,
      "type": "word",
      "position": 1
    }
  ]
}