postgresqlpostgresql-16

PostgreSQL: is the CTE guaranteed to be evaluated before INSERT?


I need to upsert a row by primary key and return the previous value before the update. I know I could do this with two separate queries — a SELECT followed by an INSERT ... ON CONFLICT ... DO UPDATE, but is it good way to do it in one query? (It's acceptable for my case if something changes between those two steps – I don't need full isolation between them)

I'm using PostgreSQL 16.

Given a table like this:

CREATE TABLE test_table (
    user_id INT PRIMARY KEY,
    session_id INT
);

I wrote a simplified query:

WITH old_data AS (
  SELECT session_id FROM test_table WHERE user_id = :user_id
)
INSERT INTO test_table (user_id, session_id) VALUES (:user_id, :session_id)
ON CONFLICT (user_id) DO UPDATE SET session_id = :session_id
RETURNING (SELECT session_id FROM old_data);

This query works and returns the previous session_id as expected. However, I'm concerned about whether PostgreSQL guarantees the evaluation order: Will the SELECT from old_data always happen before the INSERT/UPDATE modifies the row? Or is it possible that the CTE could be evaluated after the row is updated, causing the SELECT to return the new value?

From the query plan, it seems the SELECT runs first:

Insert on test_table  (cost=8.17..8.18 rows=1 width=8) (actual time=0.110..0.111 rows=1 loops=1)
  Conflict Resolution: UPDATE
  Conflict Arbiter Indexes: test_table_pkey
  Tuples Inserted: 0
  Conflicting Tuples: 1
  InitPlan 1 (returns $0)
    ->  Index Scan using test_table_pkey on test_table test_table_1  (cost=0.15..8.17 rows=1 width=4) (actual time=0.028..0.028 rows=1 loops=1)
          Index Cond: (user_id = 1)
  ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=1)
Planning Time: 0.086 ms
Execution Time: 0.143 ms

But the PostgreSQL documentation doesn't seem to explicitly guarantee the evaluation order between a SELECT in CTE and the main INSERT ... ON CONFLICT.

Can someone clarify: Is it guaranteed that old_data will be evaluated before the actual INSERT/UPDATE operation? Or is this behavior incidental and could change?

Thanks!


Solution

  • There's no such guarantee and you don't need one.

    is the CTE guaranteed to be evaluated before INSERT?

    It's not. The planner/optimizer is free to rearrange the query and execute its parts in any order it estimates to be optimal, or in parallel all at once, or even discard some entirely, never running them at all.


    Will the SELECT from old_data always happen before the INSERT/UPDATE modifies the row?

    Yes, in your example it will because the outer on conflict depends on the output of that CTE. Note however, that the CTE's result is useless if there's no conflict; if at some point in a future version Postgres planner will be able to establish a conflict won't happen or delay the CTE's execution until it has to resolve one, it'll discard it completely once it's sure that additional expression is a waste of time.

    Your pkey-based single-value select from a single-column index might be cheap enough that it'll never make much sense to play tricks like that, though.


    is it possible that the CTE could be evaluated after the row is updated, causing the SELECT to return the new value

    Never. Thanks to MVCC, there's no way for any part of a multi-statement query to see the effects of another statement of that or any other query - the only way is to establish a pipeline in your query, where one expression's output is the next one's input.

    If they try to mutate and inspect an external database object (a table, a view, etc., that lives its own life in this db, as opposed to a subquery, SRF, CTE defined in this query), they all see the state as it was when the query started, whenever they look at any transaction-safe object in the db (sequences aren't transaction-safe, for example).

    CTEs can CRUD all they want and the final select will still see the table as if nothing happened:
    demo at db<>fiddle

    create table t(v) as values(1),(2),(3);
    with cte1 as(insert into t values(5))
       , cte2 as(delete from t where v=1)
       , cte3 as(update t set v=4 where v=2)
    select*from t;
    
    v
    1
    2
    3

    And that's what each of them sees when they target it:

    You can manually rearrange them and make them all try to target the same rows, it changes nothing. Only after that whole query finished, the changes are applied and visible to anyone:

    select*from t;
    
    v
    3
    5
    4

    PostgreSQL documentation doesn't seem to explicitly guarantee the evaluation order

    That's because SQL is declarative - you're only supposed to express what you want from the database, and it's the engine's job to formulate how to achieve that. To some extent, you can influence the planner by tweaking its settings, using special clauses like with cte as MATERIALIZED, among other things.