I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY
column id
. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM
. My problem is that I want to know the id
s of the newly created rows, but COPY FROM
has no RETURNING
clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the id
s manually through COPY FROM
, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...)
.
First stream the data to a temp-table with a custom index column for ordering. Then do something like
INSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx
(I cannot do RETURNING ttFoo.idx
too because only the inserted row is available for that which doesn't contain idx
)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin
column contains the ID of the inserting transaction, so to return the id
values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE
clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING
.