My elasticsearch documents have a field Name
with entries like:
Samsung Galaxy S3
Samsung Galaxy Ace Duos 3
Samsung Galaxy Duos 3
Samsung Galaxy S2
Samsung Galaxy S (I9000)
On querying this field with the following query (notice the space between "s" and "3"):
{
"query": {
"match": {
"Name": {
"query": "galaxy s 3",
"fuzziness": 2,
"prefix_length": 1
}
}
}
}
It returns "Samsung Galaxy Duos 3"
as a relevant result, and not "Samsung Galaxy S3"
.
The pattern I notice for such a task is to disregard the space between any number and any single alphabetical character, and make the query. For example then, "I-phone 5s"
should also be returned by "I-phone 5 s"
.
Is there a nice way to accomplish this?
You need to change your analyser to break up the string on a change from text to number - using a regular expression would help (this is based on the camelcase analyser):
curl -XPUT 'localhost:9200/myindex/' -d '
{
"settings":{
"analysis": {
"analyzer": {
"mynewanalyser":{
"type": "pattern",
"pattern":"([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"
}
}
}
}
}'
testing the new analyser with your string:
-XGET 'localhost:9200/myindex/_analyze?analyzer=mynewanalyser&pretty' -d 'Samsung Galaxy S3'
{
"tokens" : [ {
"token" : "samsung",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 1
}, {
"token" : "galaxy",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 2
}, {
"token" : "s",
"start_offset" : 15,
"end_offset" : 16,
"type" : "word",
"position" : 3
}, {
"token" : "3",
"start_offset" : 16,
"end_offset" : 17,
"type" : "word",
"position" : 4
} ]
}