azurelucenefull-text-searchazure-cognitive-searchlexical-analysis

Azure Cognitive Search - When would you use different search and index analyzers?


I'm trying to understand what is the purpose of configuring a different analyzer for searching and indexing in Azure Search. See: https://learn.microsoft.com/en-us/rest/api/searchservice/create-index#-field-definitions-

According to my understanding, the job of the indexing analyzer is to breakup the input document into individual tokens. Through this process, it might apply multiple transformations like lower-casing the content, removing punctuation and white-spaces, and even removing entire words.

If the tokens are already processed, what is the use of the search analyzer?

Initially, I thought it would apply a similar process on the search query itself, but wouldn't setting a different analyzer than the one used to index the document at this stage completely breaks the search results? If the indexing analyzer lower-cased everything, but the search analyzer doesn't lower-case the query, wouldn't that means you'll never get matches for queries with upper case characters? What if the search analyzer doesn't split tokens on white-spaces? Won't you ever get a match the moment the query includes a space?

Assuming that this is indeed how the two analyzers works together, then why would you ever want to set two different ones?


Solution

  • Your understanding of the difference between index and search analyzer is correct. An example scenario where that's valuable is using ngrams for indexing but not for search terms. So this would allow a document with "cat" to produce "c", "ca", "cat" but you wouldn't necessarily want to apply ngrams on the search term as that would make the query less performant and isn't necessary since the documents already produced the ngrams. Hopefully that makes sense!