From what I've understood, When I index a document say:
PUT <index>/_doc/1
{
"title":"black white fox cat"
}
Elastic search analyzes this via a standard analyzer and turns the title into an array of tokens.
But then when I search for this document let's say
POST <index>/_search
{
"query":
{
"match":
{
"title":"black"
}
}
}
It analyzez again via the same analyzer, isn't that inefficient?
It's not efficient, its necessary step to provide the search results. let me explain under the hood, how search and index process works.
text
based on data type, and configured analyzer and index the tokens into the inverted index.There are various use-cases, like when you use the keywords data type, text
is not analyzed and when you use term level queries search time analysis doesn't happen.
Now, important thing to not is that during search time also same analyzer used at index time, otherwise it would end up generating different token which not produce match in step-3 Described earlier.