postgresqlcsvpgloader

Is it possible to insert and replace rows with pgloader?


My use case is the following: I have data coming from a csv file and I need to load it into a table (so far so good, nothing new here). It might happen that same data is sent with updated columns, in which case I would like to try to insert and replace in case of duplicate.

So my table is as follows:

CREATE TABLE codes (
  code            TEXT NOT NULL,
  position_x      INT,
  position_y      INT
  PRIMARY KEY (code)
);

And incoming csv file is like this:

TEST01,1,1
TEST02,1,2
TEST0131,3
TEST04,1,4

It might happen that sometime in the future I get another csv file with:

TEST01,1,1000 <<<<< updated value
TEST05,1,5
TEST0631,6
TEST07,1,7

Right now what is happening is when I run for the first file, everything is fine, but when I execute for the second one I'm getting an error:

2017-04-26T10:33:51.306000+01:00 ERROR Database error 23505: duplicate key value violates unique constraint "codes_pkey"
DETAIL: Key (code)=(TEST01) already exists.

I load data using:

pgloader csv.load

And my csv.load file looks like this:

LOAD CSV
     FROM 'codes.csv' (code, position_x, position_y)
     INTO postgresql://localhost:5432/codes?tablename=codes (code, position_x, position_y)

     WITH fields optionally enclosed by '"',
          fields terminated by ',';

Is what I'm trying to do possible with pgloader?

I also tried dropping constrains for the primary key but then I end up with duplicate entries in the table.

Thanks a lot for your help.


Solution

  • No, you can't. As per reference

    To work around that (load exceptions, eg PK violations), pgloader cuts the data into batches of 25000 rows each, so that when a problem occurs it's only impacting that many rows of data.

    in brackets - mine...

    The best you can do is load csv to table with same structure and then merge data with help of query (EXCEPT, OUTER JOIN ... where null and so on)