elasticsearchautocompletefuzzy-searchmatch-phrase

Elastic search query using match_phrase_prefix and fuzziness at the same time?


I am new to elastic search, so I am struggling a bit to find the optimal query for our data.

Imagine I want to match the following word "Handelsstandens Boldklub".

Currently, I'm using the following query:

{
    query: {
      bool: {
        should: [
          {
            match: {
              name: {
                query: query, slop: 5, type: "phrase_prefix"
              }
            }
          },
          {
            match: {
              name: {
                query: query,
                fuzziness: "AUTO",
                operator: "and"
              }
            }
          }
        ]
      }
    }
  }

It currently list the word if I am searching for "Hand", but if I search for "Handle" the word will no longer be listed as I did a typo. However if I reach to the end with "Handlesstandens" it will be listed again, as the fuzziness will catch the typo, but only when I have typed the whole word.

Is it somehow possible to do phrase_prefix and fuzziness at the same time? So in the above case, if I make a typo on the way, it will still list the word?

So in this case, if I search for "Handle", it will still match the word "Handelsstandens Boldklub".

Or what other workarounds are there to achieve the above experience? I like the phrase_prefix matching as its also supports sloppy matching (hence I can search for "Boldklub han" and it will list the result)

Or can the above be achieved by using the completion suggester?


Solution

  • Okay, so after investigating elasticsearch even further, I came to the conclusion that I should use ngrams.

    Here is a really good explaniation of what it does and how it works. https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

    Here is the settings and mapping I used: (This is elasticsearch-rails syntax)

    settings analysis: {
      filter: {
        ngram_filter: {
          type: "ngram",
          min_gram: "2",
          max_gram: "20"
        }
      },
      analyzer: {
        ngram_analyzer: {
          type: "custom",
          tokenizer: "standard",
          filter: ["lowercase", "ngram_filter"]
        }
      }
    } do
      mappings do
        indexes :name, type: "string", analyzer: "ngram_analyzer"
        indexes :country_id, type: "integer"
      end
    end
    

    And the query: (This query actually search in two different indexes at the same time)

    {
        query: {
          bool: {
            should: [
              {
                bool: {
                  must: [
                    { match: { "club.country_id": country.id } },
                    { match: { name: query } }
                  ]
                }
              },
              {
                bool: {
                  must: [
                    { match: { country_id: country.id } },
                    { match: { name: query } }
                  ]
                }
              }
            ],
            minimum_should_match: 1
          }
        }
      }
    

    But basically you should just do a match or multi match query, depending on how many fields you want to search in.

    I hope someone find it helpful, as I was personally thinking to much in terms of fuzziness instead of ngrams (Didn't know about before). This led me in the wrong direction.