elasticsearchelasticsearch-geo-shape

Elasticsearch Geoshape query false results


I have two geo_shapes in ES. What I need to figure out is the best way to understand if one of the shapes (Green) contains or intersects with another (Red). Please see below a visual representation of three different cases:

Case I : is easy to detect - using Green shape coordinates make a Geoshape query with “relation" = “within”

Case II : also not a problem to do - using Green shape coordinates make a Geoshape query with “relation" = “INTERSECTS”

Case III : is a real problem - using Green shape coordinates I try to make a Geoshape query with “relation" = “INTERSECTS” and the Red shape is returned as the result…that is false - this shapes do not intersect with each other (I think so) even though one of the sides are touching each other….

Any way to avoid the false positive results here? Any other suggestions how to solve this task?

P.S. the coordinates are precise (example: 13.335594692338). There is no additional mappings like tree_levels or precision...

enter image description here


Solution

  • Every polygon which is stored in Elasticsearch using geoshape is getting transformed into a list of strings. To narrow down this explanation a bit I'm gonna assume that the polygon you're storing in Elasticsearch is using geohash storage (which is default for geoshape type).

    I don't want to get into great details but take a look at this image

    geohash

    and this description taken from Elasticsearch docs (the details don't match but you need to get the big picture):

    Geohashes divide the world into a grid of 32 cells—4 rows and 8 columns—each represented by a letter or number. The g cell covers half of Greenland, all of Iceland, and most of Great Britian. Each cell can be further divided into another 32 cells, which can be divided into another 32 cells, and so on. The gc cell covers Ireland and England, gcp covers most of London and part of Southern England, and gcpuuz94k is the entrance to Buckingham Palace, accurate to about 5 meters.

    You polygon is getting projected into list of rectangles, each being represented with a string (geohash). Precision of this projection is dependent on tree level. I don't know what's the default tree level for Elasticsearch but if you're finding false positives it seems it's too low for you.

    A tree level of 8 splits the world in rectangles of size 38.2m x 19.1m. If the edge of your polygon goes trough the middle of this rectangle it may or may not (depending on implementation) assign geohash representation of this rectangle to your polygon.

    To solve your problem you need to increase the tree level to match your needs (more on the mapping here). Beware, though that size of the index will increase greatly (also dependent on size and complexity of shapes). As an example storing 1000 district size polygons (some having 100s of points) with a tree level of 8 - the index size is about 600-700MB.

    Bear in mind that whatever tree level you choose you always risk to get some false positives as geohash will never be 100% precise representation of your shape. It's a precision vs performance trade-off and geohash is the performance wise choice.