elasticsearchnestsense

ElasticSearch Order By String Length


I am using ElasticSearch via NEST c#. I have large list of information about people

{
   firstName: 'Frank',
   lastName: 'Jones',
   City: 'New York'
}

I'd like to be able to filter and sort this list of items by lastName as well as order by the length so people who only have 5 characters in their name will be at the beginning of the result set then people with 10 characters.

So with some pseudo code I'd like to do something like list.wildcard("j*").sort(m => lastName.length)


Solution

  • You can do the sorting with script-based sorting.

    As a toy example, I set up a trivial index with a few documents:

    PUT /test_index
    
    POST /test_index/doc/_bulk
    {"index":{"_id":1}}
    {"name":"Bob"}
    {"index":{"_id":2}}
    {"name":"Jeff"}
    {"index":{"_id":3}}
    {"name":"Darlene"}
    {"index":{"_id":4}}
    {"name":"Jose"}
    

    Then I can order search results like this:

    POST /test_index/_search
    {
       "query": {
          "match_all": {}
       },
       "sort": {
          "_script": {
             "script": "doc['name'].value.length()",
             "type": "number",
             "order": "asc"
          }
       }
    }
    ...
    {
       "took": 2,
       "timed_out": false,
       "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
       },
       "hits": {
          "total": 4,
          "max_score": null,
          "hits": [
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "1",
                "_score": null,
                "_source": {
                   "name": "Bob"
                },
                "sort": [
                   3
                ]
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "4",
                "_score": null,
                "_source": {
                   "name": "Jose"
                },
                "sort": [
                   4
                ]
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "2",
                "_score": null,
                "_source": {
                   "name": "Jeff"
                },
                "sort": [
                   4
                ]
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "3",
                "_score": null,
                "_source": {
                   "name": "Darlene"
                },
                "sort": [
                   7
                ]
             }
          ]
       }
    }
    

    To filter by length, I can use a script filter in a similar way:

    POST /test_index/_search
    {
       "query": {
          "filtered": {
             "query": {
                "match_all": {}
             },
             "filter": {
                "script": {
                   "script": "doc['name'].value.length() > 3",
                   "params": {}
                }
             }
          }
       },
       "sort": {
          "_script": {
             "script": "doc['name'].value.length()",
             "type": "number",
             "order": "asc"
          }
       }
    }
    ...
    {
       "took": 3,
       "timed_out": false,
       "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
       },
       "hits": {
          "total": 3,
          "max_score": null,
          "hits": [
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "4",
                "_score": null,
                "_source": {
                   "name": "Jose"
                },
                "sort": [
                   4
                ]
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "2",
                "_score": null,
                "_source": {
                   "name": "Jeff"
                },
                "sort": [
                   4
                ]
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "3",
                "_score": null,
                "_source": {
                   "name": "Darlene"
                },
                "sort": [
                   7
                ]
             }
          ]
       }
    }
    

    Here's the code I used:

    http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab

    P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.