postgresqlcsvpostgisplpgsqlpostgresql-copy

How to COPY to multiple CSV files in PostgreSQL?


I've got a PostGIS database of points in Postgres, and I would like to extract the points in several geographically distinct areas to CSV files, one file per area.

I have set up an area table with area polygons, and area titles and I would like to effectively loop through that table, using something like Postgis' st_intersects() to select the data to go in each CSV file, and get the filename for the CSV file from the title in the area table.

I'm comfortable with the details of doing the intersection code, and setting up the CSV output - what I don't know is how to do it for each area. Is it possible to do something like this with some sort of join? Or do I need to do it with a stored procedure, and use a loop construct in plpgsql?


Solution

  • You can loop over rows in your area table in plpgsql. But be careful to get quoting of identifiers and values right:

    Assuming this setup:

    CREATE TABLE area (
      title text PRIMARY KEY
    , area_polygon geometry
    ); 
    
    CREATE TABLE points(
      point_id serial PRIMARY KEY
    , the_geom geometry);
    

    You can use this plpgsql block:

    DO
    $do$
    DECLARE
       _title text;
    BEGIN
       FOR _title IN
          SELECT title FROM area
       LOOP
          EXECUTE format('COPY (SELECT p.*
                                FROM   area   a
                                JOIN   points p ON ST_INTERSECTS(p.the_geom, a.area_polygon)
                                WHERE  a.title = %L) TO %L (FORMAT csv)'
                       , _title
                       , '/path/to/' || _title || '.csv');
       END LOOP;
    END
    $do$;
    

    Use format with %L (for string literal) to get properly quoted strings to avoid syntax errors and possible SQL injection. You still need to use strings in area.title that work for file names.)

    Also careful to quote the filename as a whole, not just the title part of it.

    You must concatenate the whole command as string. The "utility command" COPY does not allow variable substitution. That's only possible with the core DML commands SELECT, INSERT, UPDATE, and DELETE. See:

    So don't read out area.area_polygon in the loop. It would have to be cast to text to concatenate it into the query string, where the text representation would be cast back to geometry (or whatever your actual undisclosed data type is). That's prone to errors.

    Instead I only read area.title to uniquely identify the row and handle the rest in the query internally.