DROP TABLE IF EXISTS A;
CREATE TABLE a (id int);
CREATE or replace FUNCTION insert_and_return(int)
RETURNS int AS $$
BEGIN
INSERT INTO a VALUES ($1);
RETURN $1;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM insert_and_return(10),A AS y;
Above code has data insertion side-effects in SELECT through a function.
Expected
insert_and_return | id |
---|---|
10 | 10 |
I was expecting insert_and_return
to insert a new row when FROM insert_and_return(10),A
does a cross-join, before SELECT sees it and returns it in result.
Actual:
Blank resultset
Question:
Why do I not see the data? I have to SELECT again to see it.
I think the SELECTs are always seeing 1 row less than there is.
I hope to understand the order of events under the hood, whether data is already inserted but somehow hidden from SELECT, or was not inserted at the moment SELECT is executed.
Related questions
Has this question got anything to do with isolation levels or it's irrelevant because isolation levels only apply to multiple transactions?
Why can data be generated on the fly with CASE or patterns like SELECT price, price * 0.9
but doesn't work here?
I tried creating separation with CTE but see the same behavior where I only see data on running the whole query below a 2nd time. Can this be explained with the same reasons for above, or CTE has its own caveats?
WITH inserted_data AS (
INSERT INTO a (id) VALUES (10)
RETURNING id
)
SELECT * FROM inserted_data,A;
Why other help failed
All sources I find online talk about concurrent read/write from multiple transactions which I believe are irrelevant because this is a single query in a single implicit transaction.
Origin of 1st query
I tweaked it from this article which focuses on load balancing instead of my question: https://www.cybertec-postgresql.com/en/why-select-from-table-is-not-a-read , please point out if my question is too edge case to be practically relevant
I was expecting
insert_and_return
to insert a new row whenFROM insert_and_return(10),A
does a cross-join, beforeSELECT
sees it and returns it in result.
Actual: Blank resultset
The function does insert the row and it also does return it to your query. It's then cross-joined with an empty set that the query sees in table a
, which nullifies the whole thing. EXPLAIN ANALYZE VERBOSE
can show you the intermediate query does yield rows=1
and it's only the final one that ends up empty because that one row got joined to zero rows:
demo at db<>fiddle
explain analyze verbose SELECT * FROM insert_and_return(10),A AS y;
Nested Loop (cost=0.25..61.26 rows=2550 width=8) (actual time=0.195..0.196 rows=0 loops=1) |
Output: insert_and_return.insert_and_return, y.id |
-> Function Scan on public.insert_and_return (cost=0.25..0.26 rows=1 width=4) (actual time=0.187..0.187 rows=1 loops=1) |
Output: insert_and_return.insert_and_return |
Function Call: insert_and_return(10) |
-> Seq Scan on public.a y (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.006 rows=0 loops=1) |
Output: y.id |
Planning Time: 0.106 ms |
Execution Time: 0.239 ms |
I'm not sure what and why you're attempting here but if you're trying to see the entire rows you inserted, not just the value your provided for one column, it sounds like you just wish to
insert into a values(10)returning *;
If you're trying to see the final state of the whole table right after inserting, you're looking for a union
:
table a
union all
select insert_and_return(10);
The table
command is just a shorthand form of a select*from
in PostgreSQL. It's the only reasonable form of involving table a
unconditionally - otherwise, if there's anything in that table, a comma ,
cross join would hit you with however many rows you just inserted, times however many you had earlier, everything paired up and listed with everything else, that many times.
If you're trying to compare the payload coming in to what effectively ends up written to the table after firing all trigger
s (and rule
s, with some caveats), that's exactly what returning
shows:
If there are triggers (Chapter 37) on the target table, the data available to
RETURNING
is the row as modified by the triggers. Thus, inspecting columns computed by triggers is another common use-case forRETURNING
.
That means if the values get processed somehow, returning
won't show what came in initially, only what ended up in the table, so returning *
loses sight of the raw input. To see it, you can repeat it as a constant:
insert into a values('your_input_value')returning 'your_input_value',*;
Example in the demo:
create function trg1_f()returns trigger as $f$
begin
NEW.id = NEW.id + 1; --incoming payload gets increased here
return NEW;
end $f$ language plpgsql;
create trigger trg1 before insert on a--happens before the write
for each row execute function trg1_f();
insert into a values(10)returning 10 as payload,*;--payload added as constant
payload | id |
---|---|
10 | 11 |
Question: Why do I not see the data? I have to SELECT again to see it.
I think the SELECTs are always seeing 1 row less than there is.
I hope to understand the order of events under the hood, whether data is already inserted but somehow hidden from SELECT, or was not inserted at the moment SELECT is executed.
There isn't really any deterministic sequence of those operations. The function scan and the table scan may or may not be executed in some order or another, or in parallel both at once - the planner/optimizer is free to rearrange those however it sees fit.
You can use subqueries, lateral
and CTE's to define some sort of flow within a single, multi-statement query, but that flow cannot involve the table state, it can't write to it, then from it, then again to it. Because of MVCC, all statements in your query see the exact same snapshot of that table for the entire lifetime of the query.
Related questions
Has this question got anything to do with isolation levels or it's irrelevant because isolation levels only apply to multiple transactions?
That's exactly right. Transaction isolation has nothing to do with statements of a single query.
Why can data be generated on the fly with CASE or patterns like
SELECT price, price * 0.9
but doesn't work here?
It does work here, you did generate data on the fly: you specified a constant 10
and it did get written into the table and returned from it. It's just that you decided to cross join
that with an empty set, so you got an empty set. Another illustration:
select price, price*0.9
from (values(1)) as v(price)
cross join (select 2 limit 0) as unrelated_empty_set(col1);
Whether the empty set you cross join
with is related or not, doesn't matter.
I tried creating separation with CTE but see the same behavior where I only see data on running the whole query below a 2nd time. Can this be explained with the same reasons for above, or CTE has its own caveats?
WITH inserted_data AS ( INSERT INTO a (id) VALUES (10) RETURNING id ) SELECT * FROM inserted_data,A;
It doesn't matter. The cross join A
resulting from the old implicit comma join syntax ,A
wipes your results from inserted_data
because you're cross joining with an empty set: throughout the lifetime of this query the snapshot of A
doesn't reflect any changes that the query aims to apply to it as its consequence.
If you really want everything side-by-side with everything, trade that for a qualified join:
WITH inserted_data AS (
INSERT INTO a (id) VALUES (10)
RETURNING id
)
SELECT *
FROM inserted_data
FULL JOIN a USING(id);
That gets you everything you just inserted on the left, paired up with nulls on the right if it's completely new or with any matching id
s you already had in table a
earlier, then lists everything else from a
on the right. Unlike the cross join
, it only multiplies the newcomers that find more than one id
match in the target table.
You can also list everything together, once, using the union
mentioned earlier:
WITH inserted_data AS (
INSERT INTO a (id) VALUES (10)
RETURNING id
)
SELECT*FROM a
UNION ALL
SELECT*FROM inserted_data;