postgresqlbashcopyzipgreenplum

Postgresql COPY inside another COPY FROM PROGRAM fires error


Here is a command to unzip & copy csv data to a GreenPlum cluster via it's segments:

copy dds.entry from program 'gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz' delimiter ';' on segment;

It works perfectly well when run on it's own, but when being encapsulated inside another copy command like this:

execute 'copy __t_run_ui (command_output) from program 'psql -U gpadmin -d masked_dwh -c 
"copy dds.entry from program 'gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz' delimiter ';' on segment;" ';

it fires 'syntax error at or near "gunzip"'. Made escapes like this (look E + backslashes before inner quotes):

execute 'copy __t_run_ui (command_output) from program 'psql -U gpadmin -d masked_dwh -c 
E"copy dds.entry from program \'gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz\' delimiter \';\' on segment;" ';

it served to no help, same error. The whole thing works inside plpgsql function operating two DBs (source dwh and destination masked_dwh). This func uses bash scripts both to switch connections between DBs and run sql commands for export & import. As of now I'm trying to run bash via 'copy ... program', but copy command fails to run with another copy inside, even when the latter is saved & called as sql file on disk or when it's program parameter is ommited and it deals with pure csv without unzipping:

execute 'copy __t_run_ui (command_output) from program 'psql -U gpadmin -d masked_dwh -c 
E"copy dds.entry from \'/data/adbdata/export/dds.entry.<SEGID>.csv\' delimiter ';' on segment;" ';

it fires 'syntax error at or near "<"' before SEGID keyword (GP okay with that) and I don't want to transfer data via a master node instead of segments. Need proper escape or something for my double copy to avoid a separate plsh/plpythonu/C func to run bash - these things are prohibited in our data storage, along with dblink. Installation: Greenplum Database 6.18.0 with PostgreSQL 9.4.24


Solution

  • As per the comment: the single quotes ' get mismatched. Switching to named dollar quotes $abc$ would make them easier to track.

    This example:

    execute 'copy __t_run_ui (command_output) from program 'psql -U gpadmin -d masked_dwh -c 
    "copy dds.entry from program 'gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz' delimiter ';' on segment;" ';
    

    Gets cut up like this:

    execute 
    'copy __t_run_ui (command_output) from program 
    '
    psql -U gpadmin -d masked_dwh -c
    "copy dds.entry from program 
        'gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz'
        delimiter ';' 
        on segment;
    " ';
    

    The single quote after program looks like it was supposed to open a nested quote and instead, it closes the initial one and leaves the psql part unquoted. I'm unsure why the error points at gunzip as the first place the parser gives up trying to understand the syntax, but I guess that message might have come from some other attempt than the one you showed.

    For posterity, here's how it can be fixed by replacing the single quotes used in PostgreSQL context:

    execute 
    $a$ copy __t_run_ui (command_output) 
        from program 
        $b$ psql -U gpadmin -d masked_dwh -c 
            ' copy dds.entry from program 
              $c$ gunzip < /data/adbdata/export/dds.entry.<SEGID>.csv.gz $c$
              delimiter $d$;$d$
              on segment;
            '
        $b$
    $a$;