I was trying out Phonetic search using Azure Search without much luck. My objective is to work out an Index configuration that can handle typos and accommodate phonetic search for end users.
With the below configuration and sample data, I was trying to search for intentionally misspelled words like 'softvare' or 'alek'. I got results for 'alek', thanks to the Phonetic analyzer; but didn't get any results for 'softvare'.
Looks like for this requirement phonetic search will not do the trick.
Only option that I found was to use synonym map. The major pitfall is that I'm unable to use the Phonetics / Custom analyzer along with Synonyms.
What are the various strategies that you would recommend for taking care of typos?
?api-version=2017-11-11&search=alec
?api-version=2017-11-11&search=softvare
"name": "phonetichotels",
"fields": [
{"name": "hotelId", "type": "Edm.String", "key":true, "searchable": false},
{"name": "baseRate", "type": "Edm.Double"},
{"name": "description", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "analyzer":"my_standard"},
{"name": "hotelName", "type": "Edm.String", "analyzer":"my_standard"},
{"name": "category", "type": "Edm.String", "analyzer":"my_standard"},
{"name": "tags", "type": "Collection(Edm.String)", "analyzer":"my_standard"},
{"name": "parkingIncluded", "type": "Edm.Boolean"},
{"name": "smokingAllowed", "type": "Edm.Boolean"},
{"name": "lastRenovationDate", "type": "Edm.DateTimeOffset"},
{"name": "rating", "type": "Edm.Int32"},
{"name": "location", "type": "Edm.GeographyPoint"}
],
"analyzers":[
{
"name":"my_standard",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding", "phonetic" ]
}
]
{
"analyzer":"my_standard",
"text": "software"
}
{
"@odata.context": "https://ctsazuresearchpoc.search.windows.net/$metadata#Microsoft.Azure.Search.V2017_11_11.AnalyzeResult",
"tokens": [
{
"token": "SFTW",
"startOffset": 0,
"endOffset": 8,
"position": 0
}
]
}
{
"analyzer":"my_standard",
"text": "softvare"
}
{
"@odata.context": "https://ctsazuresearchpoc.search.windows.net/$metadata#Microsoft.Azure.Search.V2017_11_11.AnalyzeResult",
"tokens": [
{
"token": "SFTF",
"startOffset": 0,
"endOffset": 8,
"position": 0
}
]
}
{
"@search.action": "upload",
"hotelId": "5",
"baseRate": 199.0,
"description": "Best hotel in town for software people",
"hotelName": "Fancy Stay",
"category": "Luxury",
"tags": ["pool", "view", "wifi", "concierge"],
"parkingIncluded": false,
"smokingAllowed": false,
"lastRenovationDate": "2010-06-27T00:00:00Z",
"rating": 5,
"location": { "type": "Point", "coordinates": [-122.131577, 47.678581] }
},
{
"@search.action": "upload",
"hotelId": "6",
"baseRate": 79.99,
"description": "Cheapest hotel in town ",
"hotelName": " Alec Baldwin Motel",
"category": "Budget",
"tags": ["motel", "budget"],
"parkingIncluded": true,
"smokingAllowed": true,
"lastRenovationDate": "1982-04-28T00:00:00Z",
"rating": 1,
"location": { "type": "Point", "coordinates": [-122.131577, 49.678581] }
},
With the right configuration, I should have got results even with the misspelled words.
As you could read in my post; my Objective was to handle the typos.
The only easy option is to use the inbuilt Lucene functionality - Fuzzy Search. I'm yet to check on the response times as the querytype has to be set to 'full' for using fuzzy search. Otherwise, the results were satisfactory.
Example:
search=softvare~&fuzzy=true&querytype=full
will return all documents with the 'Software' in it.
For further reading please go through Documentation