[SOLVED] Postgres import dump ignoring duplicates

Postgres import dump ignoring duplicates

Is there a way to import sql dumps into an existing database while simply ignoring everything that is already there? I have an automatically generated .sql dump. I try importing this using psql ... < dumpfile. However, it always tries to create the tables etc. and errors out. Then, when I delete the CREATE TABLE and ALTER TABLE statements from the dumpfile it errors out with "duplicate keys" messages. (In my situation, it is not important that the data is consistent, I am only using it for development.) Is there a way to import only new records?

N.B.: as there are multiple tables, and the process has to run on a recurring basis, the most automated the better

Solution

For one table

You can modify your CREATE to make it a temporary table, import fully to it, then transfer only the missing items;
This is the most flexible solution (you get a chance to adapt the data before reinjecting it to the target table),
but it is costly if the majority of data is already there (you'll still insert the whole dataset before dismissing most of the entries).

CREATE temp TABLE <TABLE_NAME> (); -- Add a temp between each CREATE TABLE, after having removed the schema (`public.`).
INSERT INTO <TABLE_NAME> VALUES ...; -- Let the import run to this temp table, after having removed the schema (`public.`).

alter table <table_name> rename to <table_name>_temp; -- Rename the temp table to unmask the original one.
insert into <table_name> select * from <table_name>_temp t where not exists (select 1 from <table_name> o where o.id = t.id); -- Transfer.

-- No need to drop the temp table:
-- it will automatically be freed if your session ends here,
-- and you would take the risk to drop the wrong table.
-- Or prefix it explicitely:
drop table pg_temp.<table_name>_temp;

or you can add a rule to directly ignore entries
(see https://stackoverflow.com/a/6176044/1346819):

create rule no_dup as
on insert to <table_name>
where exists (select 1 from <table_name> o where o.id = new.id)
do instead nothing;
INSERT INTO <table_name> VALUES ...; -- The INSERT part of your dump.

In all cases, do not forget to increment your primary key sequence after that, if reimported entries have IDs greater than those remaining in the table:

select setval('<table_name>_id_seq', max(id)) from <table_name>;

For multiple tables

With help from PL/pgSQL's execute,
we can industrialize the first solution (by temp table),
using mostly SQL and a bit of sed:

set search_path to pg_temp, public; -- For this session, make all new, non-schema-prefixed, table creations redirected to the temporary space, by putting it explicitely before the standard space.

-- Here play your dump /!\ AFTER HAVING REMOVED ALL REFERENCES TO THE SCHEMA, AND TO search_path
-- This can be achieved in a shell with:
-- sed -e '/search_path/d' -e 's/public\.//g' < mydump.sql > mydumpnopath.sql
-- Using a different name for the postprocessed dump makes the following instruction failproof: if will error if you have not run the sed: 
\i mydumpnopath.sql

-- Little helper:
create function exec(req text) returns void language plpgsql as 'begin execute req; end;';

-- You can then dynamically rename and transfer data for all the tables,
-- assuming they are the only temp tables you created in your session they will be easy to spot in pg_tables:
select exec('alter table '||tablename||' rename to '||tablename||'_temp') from pg_tables where schemaname like 'pg_temp%'; -- Rename the temp table to unmask the original one.
select exec('insert into '||left(tablename, -5)||' select * from '||tablename||' on conflict do nothing') from pg_tables where schemaname like 'pg_temp%'; -- Transfer.
with t as (select left(tablename, -5) tablename from pg_tables where schemaname like 'pg_temp%')
select exec('select setval('''||tablename||'_id_seq'', max(id)) from '||tablename) from t; -- Assuming all your tables standardize their primary on a column named id.