I search the word form
, but the exact match word form
is not the fisrt in result. Is there any way to solve this problem?
{
"query": {
"match": {
"word": "form"
}
}
}
word score
--------------------------
formulation 10.864353
formaldehyde 10.864353
formless 10.864353
formal 10.84412
formerly 10.84412
forma 10.84412
formation 10.574185
formula 10.574185
formulate 10.574185
format 10.574185
formally 10.574185
form 10.254687
former 10.254687
formidable 10.254687
formality 10.254687
formative 10.254687
ill-formed 10.054999
in form 10.035862
pro forma 9.492243
The word form
in search has only one token form
.
In index, form
tokens are ["f", "fo", "for", "form"]; formulation
tokens are ["f", "fo", ..., "formulatio", "formulation"].
"edgengram_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
"analyzer": {
"abc_vocab_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"keyword_repeat",
"lowercase",
"asciifolding",
"edgengram_filter",
"unique"
]
},
"abc_vocab_search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"keyword_repeat",
"lowercase",
"asciifolding",
"unique"
]
}
}
"word": {
"type": "text",
"analyzer": "abc_vocab_analyzer",
"search_analyzer": "abc_vocab_search_analyzer"
}
You get the result in the way you see because you've implemented edge-ngram
filter and that form
is a sub-string of the words similar to it. Basically in inverted index it would also store the document ids that contains formulation
, formal
etc.
Therefore, your relevancy also gets computed in that way. You can refer to this link and I'd specifically suggest you to go through sections Default Similarity
and BM25
. Although the present default similarity is BM25
, that link would help you understand how scoring works.
You would need to create another sibling field which you can apply in a should clause. You can go ahead and create keyword
sub-field with Term Query
but you need to be careful about case-sensitivity.
Instead, as mentioned by @Val, you can create a sibling of text
field with standard analyzer.
{
"word":{
"type": "text",
"analyzer": "abc_vocab_analyzer",
"search_analyzer": "abc_vocab_search_analyzer"
"fields":{
"standard":{
"type": "text"
}
}
}
}
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"word": "form"
}
}
],
"should": [ <---- Note this
{
"match": {
"word.standard": "form"
}
}
]
}
}
}
Let me know if this helps!