azure-cosmosdbgeospatialspatial-query

CosmosDB ST_WITHIN and ST_INTERSECTS give different results than using explicit latitude / longitude values


I have a CosmosDB container with 10000 documents. Every document contains a random (geography) Point that is within a specific bounds.

I am using the CosmosDB emulator and Azure Cosmos DB .NET SDK (v3) to perform some geo-spatial queries. however, queries using ST_INTERSECTS and ST_WITHIN are giving different results than a query using latitude and longitude min/max values, as illustrated below:

1 - Query using explicit latitude / longitude minimum and maximum values - gives the correct result:

SELECT r.id FROM testcontainer r 
 WHERE r.lat >= 53.393183 AND r.lat <= 53.657421
   AND r.lon >= -113.733431 AND r.lon <= -113.247049

Query Total time: 00:00:00.1293465
Total Request Units consumed: 244.15
Documents found: 10000

2 - Query using ST_INTERSECTS - should have the same result as query #1?

SELECT r.id FROM testcontainer r 
 WHERE ST_INTERSECTS( r.geopoint, {'type':'Polygon','coordinates':[[[-113.247049,53.657421],[-113.733431,53.657421],[-113.733431,53.393183],[-113.247049,53.393183],[-113.247049,53.657421]]]})

Query Total time: 00:00:00.3134729
Total Request Units consumed: 226.17000000000002
Documents found: 9994

3 - Query using ST_WITHIN - should have the same result as query #1?

SELECT r.id FROM testcontainer r 
 WHERE ST_WITHIN( r.geopoint, {'type':'Polygon','coordinates':[[[-113.247049,53.657421],[-113.733431,53.657421],[-113.733431,53.393183],[-113.247049,53.393183],[-113.247049,53.657421]]]})

Query Total time: 00:00:00.3107136
Total Request Units consumed: 962.91
Documents found: 9994

All of the queries above should have returned 10000 documents -- but the ST_INTERSECTS and ST_WITHIN seem to fall short and only query #1 gives the correct result - I cannot explain why.

Does anyone have any insights as to why ST_WITHIN and ST_INTERSECTS would not give the same result as the query with explicit latitude / longitude values?

Additional Information:

When I do the following query:

SELECT r.id, r.geopoint FROM testcontainer r 
 WHERE NOT ST_INTERSECTS( r.geopoint, {'type':'Polygon','coordinates':[[[-113.247049,53.657421],[-113.733431,53.657421],[-113.733431,53.393183],[-113.247049,53.393183],[-113.247049,53.657421]]]})

The query result is the missing 6 documents - but looking at the results, there is only one result with a longitude value of -113.247049 that matches a longitude value specified in the bounds.

[
    {
        "id": "6892de3d-4b8a-42c6-8421-18b65a70146f",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.465305,
                53.393386
            ]
        }
    },
    {
        "id": "6b89e0fc-7f65-4f75-90db-63dc599b98dd",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.247049,
                53.623584
            ]
        }
    },
    {
        "id": "467a388a-f0fd-4717-9eb5-3a8f538df38b",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.302698,
                53.393212
            ]
        }
    },
    {
        "id": "ada61462-5131-4e25-b0be-3bee42fcde05",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.362105,
                53.393276
            ]
        }
    },
    {
        "id": "3bfd30a7-abcd-4216-905f-5a3831869626",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.340676,
                53.393227
            ]
        }
    },
    {
        "id": "c259a408-e6f0-4050-b7ad-6f55d9a614da",
        "geopoint": {
            "type": "Point",
            "coordinates": [
                -113.400457,
                53.393275
            ]
        }
    }
]

Below is an Excel (screen snapshot) that shows the delta (difference) between the bounds lat/lon values and those from the "missing" documents:

Delta lat/lon values

Looking at the differences, a small deviation will exclude some points when using ST_WITHIN / ST_INTERSECTS as per the answer given by Michael Entin.

Thanks.


Solution

  • That's the difference between geography (shapes on a sphere with geodesic edges) and geometry (shapes on a flat map with edges that look straight on that map).

    If you work with geography, the "horizonal" edges of your GeoJson "rectangle" follow geodesics and thus bend a bit towards poles. Thus the rectangle describes a slightly different shape from what you expect.

    If the min/max results in your first query are what you want - switch to geometry.