I have a mysql query to get all places count from an area. If I query for only one id it's really quick, if I query for two ids or more then it's really slow.
Areas.geometry and Places.location are SPATIAL indexes.
There is only 3 rows ( all have complex geometry. The row 3 is the more complex ) in areas table and 3000 rows in stores. I build up a demo sql file to import if you want to test : geospatial-exemple.sql
Some exemple :
This query is running in 260ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (1)
This query is running in 320ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (3)
This query is running in 50s :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (1,3)
I also tried to hardcode the areas.geometry in the query with the more complex MULTIPOLYGON
This query is running in 380ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(ST_GeomFromText("MULTIPOLYGON((...))",
4326,
'axis-order=long-lat'),p.location)
) as places_count
FROM areas a
WHERE a.id in (1,3)
So clearly it's faster to run multiple queries than only one and wait for a minutes. If somebody know if it's a mysql bug or if there is another way to do that ? Working with Join query give the same results.
According to John Powells answer here, there is an undocumented limitation for spatial indexes:
For the Contains and Intersects functions to work properly, and for the index to be used, you need to have one of the geometries be a constant. This doesn't appear to be documented, although all the examples you will see with MySQL with Intersects/Contains work this way.
So running multiple queries with one area each would indeed be faster.
If you have the permissions to create functions, you can however use a workaround by running your subquery in a function, where areas.geometry
will now act as a constant parameter for ST_Contains()
:
CREATE FUNCTION fn_getplacescount(_targetarea GEOMETRY)
RETURNS INT READS SQL DATA
RETURN (SELECT COUNT(*) FROM places p WHERE ST_Contains(_targetarea, p.location));
Now
SELECT a.name, fn_getplacescount(a.geometry) AS places_count
FROM areas a WHERE a.id in (1,3);
would be similar to running each area separately, and should have a similar execution time as using two separate queries.