elasticsearchsearchkick

Elasticsearch stems words end with symbols/special character


I use a whitespace tokenizer, with searchkick_stemmer

"company" -> "compani"

"company+" -> "company+"

how can I make "company+" to be "compani+" or ["compani","+"]

I've tried with edge-gram, works fine, but it generated too many tokens. I'm considering if there is another approach, like conditional scripting or else.


Solution

  • I did this example but recommend read pattern token filter

    POST _analyze
    {
      "tokenizer": "whitespace",
      "filter": [
        "stemmer"
      ],
      "char_filter": {
        "type": "pattern_replace",
        "pattern": "[+]",
        "replacement": " $0"
      },
      "text": [
        "company+"
      ]
    }
    

    Tokens:

    {
      "tokens": [
        {
          "token": "compani",
          "start_offset": 0,
          "end_offset": 7,
          "type": "word",
          "position": 0
        },
        {
          "token": "+",
          "start_offset": 7,
          "end_offset": 8,
          "type": "word",
          "position": 1
        }
      ]
    }