I've got a PostGIS database of points in Postgres, and I would like to extract the points in several geographically distinct areas to CSV files, one file per area.
I have set up an area
table with area polygons, and area titles and I would like to effectively loop through that table, using something like Postgis' st_intersects()
to select the data to go in each CSV file, and get the filename for the CSV file from the title in the area table.
I'm comfortable with the details of doing the intersection code, and setting up the CSV output - what I don't know is how to do it for each area. Is it possible to do something like this with some sort of join? Or do I need to do it with a stored procedure, and use a loop construct in plpgsql?
You can loop over rows in your area
table in plpgsql. But be careful to get quoting of identifiers and values right:
Assuming this setup:
CREATE TABLE area (
title text PRIMARY KEY
, area_polygon geometry
);
CREATE TABLE points(
point_id serial PRIMARY KEY
, the_geom geometry);
You can use this plpgsql block:
DO
$do$
DECLARE
_title text;
BEGIN
FOR _title IN
SELECT title FROM area
LOOP
EXECUTE format('COPY (SELECT p.*
FROM area a
JOIN points p ON ST_INTERSECTS(p.the_geom, a.area_polygon)
WHERE a.title = %L) TO %L (FORMAT csv)'
, _title
, '/path/to/' || _title || '.csv');
END LOOP;
END
$do$;
Use format with %L
(for string literal) to get properly quoted strings to avoid syntax errors and possible SQL injection. You still need to use strings in area.title
that work for file names.)
Also careful to quote the filename as a whole, not just the title part of it.
You must concatenate the whole command as string. The "utility command" COPY
does not allow variable substitution. That's only possible with the core DML commands SELECT
, INSERT
, UPDATE
, and DELETE
. See:
So don't read out area.area_polygon
in the loop. It would have to be cast to text
to concatenate it into the query string, where the text representation would be cast back to geometry
(or whatever your actual undisclosed data type is). That's prone to errors.
Instead I only read area.title
to uniquely identify the row and handle the rest in the query internally.