elasticsearchspatial-index

Elasticsearch spatial search with terms


I am running a query on 140+ million documents with spatial data. Purely spatial queries are extremely fast (sub 1s). Adding a wildcard to the same geometry results in the query taking ~10-20s. I expect wildcard queries to take some time, but I want to know if there is a better way to write the query or trick Elasticsearch into filtering the results to only the geometry and then finding wildcard matches. Or, maybe running the spatial query and then running the wildcard on the resulting doc ids? Any ideas which may result in faster results for the end user would be appreciated.

GET parcels/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "name.keyword": {
              "value": "*smith*"
            }
          }
        },
        {
          "bool": {
            "filter": [
              {
                "geo_shape": {
                  "shape": {
                    "shape": {
                      "type": "POLYGON",
                      "coordinates": [
                        [
                          [
                            -81.09980486601305,
                            32.063655184739936
                          ],
                          [
                            -81.09980486601168,
                            32.05639855631687
                          ],
                          [
                            -81.09128330779276,
                            32.05639855631687
                          ],
                          [
                            -81.09128330779276,
                            32.06365489826756
                          ],
                          [
                            -81.09980486601305,
                            32.063655184739936
                          ]
                        ]
                      ]
                    },
                    "relation": "intersects"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  },
  "size": 10000
}

our settings for the index:

{
...
"analysis": {
    "normalizer": {
        "search_normalizer": {
            "filter": [
                "uppercase"
            ],
            "type": "custom"
         }
     }
},
"number_of_shards": 8,
"number_of_replicas": 1,

mapping for the 'name' field:

"name": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "normalizer": "search_normalizer"
        }
    }
},

Running ES 7.10. (5 nodes each with 8GB RAM)

Not searching by wildcard is not an option.

Any help is appreciated.


Solution

  • Using a wildcard search with a prefix wildcard (as in *smith*) on a keyword field is a performance killer!

    If you absolutely need this kind of functionality, you need to leverage the new wildcard field type which is meant exactly for this kind of use.

    So you can either add another sub-field or change the keyword sub-field to a ?wildcard` sub-field.

    You can see how it works under the hood in the blog article where the wildcard field was described when it came out.