sqlpostgresqltransactions

Why does database not read from newly inserted data when a single query is doing both write and read?


DROP TABLE IF EXISTS A;
CREATE TABLE a (id int);

CREATE or replace FUNCTION insert_and_return(int) 
    RETURNS int AS $$
    BEGIN
        INSERT INTO a VALUES ($1);
        RETURN $1;
    END;
$$ LANGUAGE plpgsql;

SELECT * FROM insert_and_return(10),A AS y;

Above code has data insertion side-effects in SELECT through a function.

Expected

insert_and_return id
10 10

I was expecting insert_and_return to insert a new row when FROM insert_and_return(10),A does a cross-join, before SELECT sees it and returns it in result.

Actual:

Blank resultset

Question:

Why do I not see the data? I have to SELECT again to see it.

I think the SELECTs are always seeing 1 row less than there is.

I hope to understand the order of events under the hood, whether data is already inserted but somehow hidden from SELECT, or was not inserted at the moment SELECT is executed.

Related questions

I tried creating separation with CTE but see the same behavior where I only see data on running the whole query below a 2nd time. Can this be explained with the same reasons for above, or CTE has its own caveats?

WITH inserted_data AS (
    INSERT INTO a (id) VALUES (10)
    RETURNING id
)
SELECT * FROM inserted_data,A;

Why other help failed

All sources I find online talk about concurrent read/write from multiple transactions which I believe are irrelevant because this is a single query in a single implicit transaction.

Origin of 1st query

I tweaked it from this article which focuses on load balancing instead of my question: https://www.cybertec-postgresql.com/en/why-select-from-table-is-not-a-read , please point out if my question is too edge case to be practically relevant


Solution

  • I was expecting insert_and_return to insert a new row when FROM insert_and_return(10),A does a cross-join, before SELECT sees it and returns it in result.
    Actual: Blank resultset

    The function does insert the row and it also does return it to your query. It's then cross-joined with an empty set that the query sees in table a, which nullifies the whole thing. EXPLAIN ANALYZE VERBOSE can show you the intermediate query does yield rows=1 and it's only the final one that ends up empty because that one row got joined to zero rows:
    demo at db<>fiddle

    explain analyze verbose SELECT * FROM insert_and_return(10),A AS y;
    
    Nested Loop (cost=0.25..61.26 rows=2550 width=8) (actual time=0.195..0.196 rows=0 loops=1)

      Output: insert_and_return.insert_and_return, y.id

      -> Function Scan on public.insert_and_return (cost=0.25..0.26 rows=1 width=4) (actual time=0.187..0.187 rows=1 loops=1)

            Output: insert_and_return.insert_and_return

            Function Call: insert_and_return(10)

      -> Seq Scan on public.a y (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.006 rows=0 loops=1)

            Output: y.id

    Planning Time: 0.106 ms
    Execution Time: 0.239 ms

    I'm not sure what and why you're attempting here but if you're trying to see the entire rows you inserted, not just the value your provided for one column, it sounds like you just wish to

    insert into a values(10)returning *;
    

    If you're trying to see the final state of the whole table right after inserting, you're looking for a union:

    table a
    union all
    select insert_and_return(10);
    

    The table command is just a shorthand form of a select*from in PostgreSQL. It's the only reasonable form of involving table a unconditionally - otherwise, if there's anything in that table, a comma , cross join would hit you with however many rows you just inserted, times however many you had earlier, everything paired up and listed with everything else, that many times.

    If you're trying to compare the payload coming in to what effectively ends up written to the table after firing all triggers (and rules, with some caveats), that's exactly what returning shows:

    If there are triggers (Chapter 37) on the target table, the data available to RETURNING is the row as modified by the triggers. Thus, inspecting columns computed by triggers is another common use-case for RETURNING.

    That means if the values get processed somehow, returning won't show what came in initially, only what ended up in the table, so returning * loses sight of the raw input. To see it, you can repeat it as a constant:

    insert into a values('your_input_value')returning 'your_input_value',*;
    

    Example in the demo:

    create function trg1_f()returns trigger as $f$
    begin
       NEW.id = NEW.id + 1; --incoming payload gets increased here
       return NEW;
    end $f$ language plpgsql;
    
    create trigger trg1 before insert on a--happens before the write
    for each row execute function trg1_f();
    
    insert into a values(10)returning 10 as payload,*;--payload added as constant
    
    payload id
    10 11

    Question: Why do I not see the data? I have to SELECT again to see it.
    I think the SELECTs are always seeing 1 row less than there is.
    I hope to understand the order of events under the hood, whether data is already inserted but somehow hidden from SELECT, or was not inserted at the moment SELECT is executed.

    There isn't really any deterministic sequence of those operations. The function scan and the table scan may or may not be executed in some order or another, or in parallel both at once - the planner/optimizer is free to rearrange those however it sees fit.

    You can use subqueries, lateral and CTE's to define some sort of flow within a single, multi-statement query, but that flow cannot involve the table state, it can't write to it, then from it, then again to it. Because of MVCC, all statements in your query see the exact same snapshot of that table for the entire lifetime of the query.
    From 7.8.4. Data-Modifying Statements in WITH:

    The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query.


    Related questions
    Has this question got anything to do with isolation levels or it's irrelevant because isolation levels only apply to multiple transactions?

    That's exactly right. Transaction isolation has nothing to do with statements of a single query.


    Why can data be generated on the fly with CASE or patterns like SELECT price, price * 0.9 but doesn't work here?

    It does work here, you did generate data on the fly: you specified a constant 10 and it did get written into the table and returned from it. It's just that you decided to cross join that with an empty set, so you got an empty set. Another illustration:

    select price, price*0.9 
    from (values(1)) as v(price)
    cross join (select 2 limit 0) as unrelated_empty_set(col1);
    

    Whether the empty set you cross join with is related or not, doesn't matter.


    I tried creating separation with CTE but see the same behavior where I only see data on running the whole query below a 2nd time. Can this be explained with the same reasons for above, or CTE has its own caveats?

    WITH inserted_data AS (
       INSERT INTO a (id) VALUES (10)
       RETURNING id
    )
    SELECT * FROM inserted_data,A;
    

    It doesn't matter. The cross join A resulting from the old implicit comma join syntax ,A wipes your results from inserted_data because you're cross joining with an empty set: throughout the lifetime of this query the snapshot of A doesn't reflect any changes that the query aims to apply to it as its consequence.

    If you really want everything side-by-side with everything, trade that for a qualified join:

    WITH inserted_data AS (
        INSERT INTO a (id) VALUES (10)
        RETURNING id
    )
    SELECT * 
    FROM inserted_data
    FULL JOIN a USING(id);
    

    That gets you everything you just inserted on the left, paired up with nulls on the right if it's completely new or with any matching ids you already had in table a earlier, then lists everything else from a on the right. Unlike the cross join, it only multiplies the newcomers that find more than one id match in the target table.

    You can also list everything together, once, using the union mentioned earlier:

    WITH inserted_data AS (
        INSERT INTO a (id) VALUES (10)
        RETURNING id
    )
    SELECT*FROM a
    UNION ALL
    SELECT*FROM inserted_data;