I am new to elastic search, so I am struggling a bit to find the optimal query for our data.
Imagine I want to match the following word "Handelsstandens Boldklub".
Currently, I'm using the following query:
{
query: {
bool: {
should: [
{
match: {
name: {
query: query, slop: 5, type: "phrase_prefix"
}
}
},
{
match: {
name: {
query: query,
fuzziness: "AUTO",
operator: "and"
}
}
}
]
}
}
}
It currently list the word if I am searching for "Hand", but if I search for "Handle" the word will no longer be listed as I did a typo. However if I reach to the end with "Handlesstandens" it will be listed again, as the fuzziness will catch the typo, but only when I have typed the whole word.
Is it somehow possible to do phrase_prefix and fuzziness at the same time? So in the above case, if I make a typo on the way, it will still list the word?
So in this case, if I search for "Handle", it will still match the word "Handelsstandens Boldklub".
Or what other workarounds are there to achieve the above experience? I like the phrase_prefix matching as its also supports sloppy matching (hence I can search for "Boldklub han" and it will list the result)
Or can the above be achieved by using the completion suggester?
Okay, so after investigating elasticsearch even further, I came to the conclusion that I should use ngrams.
Here is a really good explaniation of what it does and how it works. https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Here is the settings and mapping I used: (This is elasticsearch-rails syntax)
settings analysis: {
filter: {
ngram_filter: {
type: "ngram",
min_gram: "2",
max_gram: "20"
}
},
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"]
}
}
} do
mappings do
indexes :name, type: "string", analyzer: "ngram_analyzer"
indexes :country_id, type: "integer"
end
end
And the query: (This query actually search in two different indexes at the same time)
{
query: {
bool: {
should: [
{
bool: {
must: [
{ match: { "club.country_id": country.id } },
{ match: { name: query } }
]
}
},
{
bool: {
must: [
{ match: { country_id: country.id } },
{ match: { name: query } }
]
}
}
],
minimum_should_match: 1
}
}
}
But basically you should just do a match or multi match query, depending on how many fields you want to search in.
I hope someone find it helpful, as I was personally thinking to much in terms of fuzziness instead of ngrams (Didn't know about before). This led me in the wrong direction.