I'm working on a query using the PostGIS extension that implements a 'spatial join' work. Running the query took an incredibly long time and failed in the end. The query is as follows:
CREATE INDEX buffer_table_geom_idx ON buffer_table USING GIST (geom);
CREATE INDEX point_table_geom_idx ON point_table USING GIST (geom);
SELECT
point_table.*,
buffer_table.something
FROM
point_table
LEFT JOIN buffer_table ON ST_Intersects (buffer_table.geom, point_table.geom);
where the point_table stands for a table that contains over 10 million rows of point records; the buffer_table stands for a table that contains only one multi-polygon geometry.
I would want to know if there is anything wrong with my code and ways to adjust. Thanks in advance.
With a LEFT JOIN
you're going through every single record of point_table
and therefore ignoring the index. Try this and see the difference:
SELECT point_table.*
FROM point_table
JOIN buffer_table ON ST_Contains(buffer_table.geom, point_table.geom);
ST_SubDivide
Considering the size of your multipolygon (see comments), it might be interesting to divide it into smaller pieces, so that the number of vertices for each containment/intersection calculation also gets reduced, consequently making the query less expensive.
First divide the large geometry into smaller pieces and store in another table (you can also use a CTE/Subquery)
CREATE TABLE buffer_table_divided AS
SELECT ST_SubDivide(geom) AS geom FROM buffer_table
CREATE INDEX buffer_table_geom_divided_idx ON buffer_table_divided USING GIST (geom);
.. and perform your query once again against this new table:
SELECT point_table.*
FROM point_table
JOIN buffer_table_divided d ON ST_Contains (d.geom, point_table.geom);
Demo: db<>fiddle