elasticsearchmatch-phrase

How to generate multi-word search suggestions


I'm using Elasticsearch to build a small search app and am trying to figure out how to build an autocomplete feature with multi-word (phrase) suggestions. I have it working... sort of...

I get mostly single word suggestions, but when I hit the space bar - it kills the suggestions.

For example, if I type "fast" it works fine, if I type "fast " - that stops the suggestions from appearing.

I'm using Edge N Grams and match_phrase_prefix and have followed the examples here and here to build it out. For the _all field in match_phrase_prefix and just used include_in_all: false to cancel all the fields out except for title and content. I'm starting to think its just because I'm testing on a small data set and there simply aren't enough tokenized terms to produce multi-word suggestions. Please take a look at the relevant code below and advise me where I'm going wrong, if any?

"analysis": {
"filter": {
 "autocomplete_filter": {
  "type": "edge_ngram",
  "min_gram": "1",
  "max_gram": "20",
  "token_chars": [
    "letter",
    "digit"
  ]
 }
},
"analyzer": {
  "autocomplete": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
       "lowercase",
       "asciifolding",
       "autocomplete_filter"
    ]     
  },
  "whitespace_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
      "lowercase",
      "asciifolding"
      ]

Solution

  • try keyword tokenizer

    "autocomplete": {
        "type": "custom",
               "filter": [
           "lowercase",
           "asciifolding",
           "autocomplete_filter"
        ],
     "tokenizer": "keyword"     
      }
    

    for reference elasticsearch mapping tokenizer keyword to avoid splitting tokens and enable use of wildcard

    Since by default its standard anaylyzer that splits on spaces You can check your tokens like curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04' reference https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html