Background: We have ~140 million polygons split into 5 indices (region-[1-5]) with 2 shards each. It was loaded with ES 7.10. The field containing the polygon is named 'shape' and is mapped as a geo_shape field.
Here's an indexed example:
"shape": {
"type": "Polygon",
"coordinates": [
[
[
-80.661103428642,
28.0213473946004
],
[
-80.6611091545036,
28.0210035893407
],
[
-80.6615120749597,
28.021009053184
],
[
-80.6615063490981,
28.0213528568402
],
[
-80.661103428642,
28.0213473946004
]
]
]
},
Our problem occurs when querying for polygons which intersect a given (usually hand-drawn) shape. e.g.:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POLYGON",
"coordinates": [
[
[
-81.0864386380646,
32.07339101099513
],
[
-81.0890350163911,
32.07282734995984
],
[
-81.08907793173533,
32.07190002908301
],
[
-81.08796213278512,
32.07151818834138
],
[
-81.08648155340886,
32.071481822473295
],
[
-81.08459327826233,
32.07231823378
],
[
-81.0841426671478,
32.073136454828834
],
[
-81.08480785498352,
32.073645566452704
],
[
-81.08527992377016,
32.07390012120158
],
[
-81.08530138144226,
32.07390012120158
],
[
-81.0864386380646,
32.07339101099513
]
]
]
},
"relation": "intersects"
}
}
}
}
},
"size": 1000
}
When we run the above query, we are getting some results that are up to 30ft outside of the drawn polygon. The false positives are not uniform (we can't just negative buffer our search polygon to return correct intersections). We have also dropped a single point as the search geometry in the middle of 1 of our indexed polygons and have gotten the intersected polygon as well as a few of the surrounding polygons back.
Reading over the docs and blogs, it looks like specifying any sort of precision is still available but will soon be deprecated and that the new tessellation technique for indexing is supposed to be accurate up to a few mm out of the box.
Is there any way to set up the index/cluster or execute the query differently that we have overlooked to make spatial intersection queries more accurate?
Thank you.
Here is an actual example with a point in the center of one of the polygons. It returns 3 hits, the intersected one (correct) and one from either side of the intersected (incorrect).: Request:
GET region_parcels*/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"shape": {
"shape": {
"type": "POINT",
"coordinates": [
-81.08111523359743,
32.04772418111284
]
},
"relation": "intersects"
}
}
}
}
},
"_source": ["shape"],
"explain": true,
"size": 1000
}
Response:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_shard" : "[<my_index>][0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cY2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0810247260436,
32.0478338967803
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0811167162237,
32.0478596090633
],
[
-81.0810247260436,
32.0478338967803
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "<my_index>[0]",
"_node" : "lrSfQEyVTWmWU828O6Qdsw",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "dI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0809327358636,
32.0478081852515
],
[
-81.0810333624468,
32.0475470233845
],
[
-81.0811253535251,
32.0475727349866
],
[
-81.0810247260436,
32.0478338967803
],
[
-81.0809327358636,
32.0478081852515
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
},
{
"_shard" : "[<my_index>][1]",
"_node" : "8jO4hXBuQL-cGobekTsjwg",
"_index" : "<my_index>",
"_type" : "_doc",
"_id" : "cI2O9XcBlBVQyhnplhLN",
"_score" : 0.0,
"_source" : {
"shape" : {
"coordinates" : [
[
[
-81.0811167162237,
32.0478596090633
],
[
-81.0812173428069,
32.0475984458201
],
[
-81.0813093320886,
32.0476241574079
],
[
-81.0812087064037,
32.0478853205776
],
[
-81.0811167162237,
32.0478596090633
]
]
],
"type" : "Polygon"
}
},
"_explanation" : {
"value" : 0.0,
"description" : "ConstantScore(IntersectsPrefixTreeQuery(fieldName=shape,queryShape=Pt(x=-81.08111523359743,y=32.04772418111284),detailLevel=21,prefixGridScanLevel=20))^0.0",
"details" : [ ]
}
}
]
}
}
enter code here
It turns out that the mapping for our shape field was explicitly being set with the property strategy: "recursive"
.
When we created the component template mapping for the field, we set the 'ignore malformed' to true under the 'advanced settings' in Kibana. Whenever we loaded data into the index it automatically used the old tree structure. This must be a bug as you would not expect setting one of the advanced settings would set the tree type. I was able to replicate the behavior with a new mapping and index.
Since we wanted to keep the 'ignore malformed' option, I recreated the mapping by loading the json with:
"shape": {
"type": "geo_shape",
"ignore_malformed": true,
}
This preserved our option, and when we loaded data to the index, it was using the default tree. We were able to confirm this by running our previous searches which were now highly accurate (inches, if not more).