I want to build an autocomplete feature using ElasticSearch and C#. But I am not getting the desired result. For demo purpose this is what I have done.
1) Created index called "names":
PUT names?pretty
2) Added 20 entries using POST command:
POST names/_doc/1
{
"name" : "John Smith"
}
3) List of Names:
[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
"John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
"Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]
4) When I run a prefix query:
GET names/_search
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
I expect to get back "Smith John", "Smitha John"
... But I am getting back "John Smith", "John Smitha"
...
What am I doing wrong? What do I need to change and where?
You are defining your name
field as text
field which by default uses the standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API of ES.
URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze
{
"text": "John Smith",
"analyzer" : "keyword"
}
The output of above API
{
"tokens": [
{
"token": "John Smith",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
}
]
}
Notice that it's not breaking the text
and storing it as it is as explained in official ES doc.
{
"text": "Smith John",
"analyzer" : "standard"
}
The output of the above API:
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "smith",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when prefix query isn't analyzed and send it as it is to ES, hence Smith
notice with Capital S
would be sent to ES for token matching, now with updated mapping, only documents starting with Smith
will have that prefix and only these will come in search results.
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
EDIT: :- ** Updated the setting based on the OP comments and based on above setting and search query, it gets only the results starts with Smith
as shown in below output
{
"took": 811,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "59977669",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"name": "Smith John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "7",
"_score": 1.0,
"_source": {
"name": "Smithb John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "8",
"_score": 1.0,
"_source": {
"name": "Smithc John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"name": "Smithd John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "10",
"_score": 1.0,
"_source": {
"name": "Smithe John"
}
}
]
}
}