elasticsearchelasticsearch-analyzers

ElasticSearch starts with query for autocomplete feature


I want to build an autocomplete feature using ElasticSearch and C#. But I am not getting the desired result. For demo purpose this is what I have done.

1) Created index called "names":

PUT names?pretty

2) Added 20 entries using POST command:

POST names/_doc/1
{
  "name" : "John Smith"
}

3) List of Names:

[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
  "John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
  "Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]

4) When I run a prefix query:

GET names/_search
{
  "query": {
    "prefix": {
      "name": {
        "value": "Smith"
      }
    }
  }
}

I expect to get back "Smith John", "Smitha John"... But I am getting back "John Smith", "John Smitha"...

What am I doing wrong? What do I need to change and where?


Solution

  • You are defining your name field as text field which by default uses the standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API of ES.

    Tokens example for keyword analyzer

    URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze

    {
      "text": "John Smith",
      "analyzer" : "keyword"
    }
    

    The output of above API

    {
        "tokens": [
            {
                "token": "John Smith",
                "start_offset": 0,
                "end_offset": 10,
                "type": "word",
                "position": 0
            }
        ]
    }
    

    Notice that it's not breaking the text and storing it as it is as explained in official ES doc.

    Tokens with standard analyzer

    {
      "text": "Smith John",
      "analyzer" : "standard"
    }
    

    The output of the above API:

    {
        "tokens": [
            {
                "token": "john",
                "start_offset": 0,
                "end_offset": 4,
                "type": "<ALPHANUM>",
                "position": 0
            },
            {
                "token": "smith",
                "start_offset": 5,
                "end_offset": 10,
                "type": "<ALPHANUM>",
                "position": 1
            }
        ]
    }
    

    Now when prefix query isn't analyzed and send it as it is to ES, hence Smith notice with Capital S would be sent to ES for token matching, now with updated mapping, only documents starting with Smith will have that prefix and only these will come in search results.

    Mapping

    {
        "mappings": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "keyword"
                }
            }
        }
    }
    

    Search Query

    {
        "query": {
            "prefix": {
                "name": {
                    "value": "Smith"
                }
            }
        }
    }
    

    EDIT: :- ** Updated the setting based on the OP comments and based on above setting and search query, it gets only the results starts with Smith as shown in below output

    {
      "took": 811,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": {
          "value": 5,
          "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
          {
            "_index": "59977669",
            "_type": "_doc",
            "_id": "6",
            "_score": 1.0,
            "_source": {
              "name": "Smith John"
            }
          },
          {
            "_index": "59977669",
            "_type": "_doc",
            "_id": "7",
            "_score": 1.0,
            "_source": {
              "name": "Smithb John"
            }
          },
          {
            "_index": "59977669",
            "_type": "_doc",
            "_id": "8",
            "_score": 1.0,
            "_source": {
              "name": "Smithc John"
            }
          },
          {
            "_index": "59977669",
            "_type": "_doc",
            "_id": "9",
            "_score": 1.0,
            "_source": {
              "name": "Smithd John"
            }
          },
          {
            "_index": "59977669",
            "_type": "_doc",
            "_id": "10",
            "_score": 1.0,
            "_source": {
              "name": "Smithe John"
            }
          }
        ]
      }
    }