sqlpostgresqlcountsql-update

Is it possible to query the state of an updated table in postgresql within the same query?


I want to update a table in a WITH query (not a requirement, if it's achievable another way that's fine too) and count rows of the updated table (including the updated row) for the subsequent query.

example:

WITH updated AS (
  UPDATE table1
  SET enabled = true
  WHERE id = 'efgh'
  RETURNING data
)
UPDATE table2 t
SET data = u.data, count = (
    SELECT COUNT(*)
    FROM table1
    WHERE enabled = true
)
FROM updated u
WHERE id = 'abcd'
RETURNING t.count

The goal is that count would be set to 1

I've tried several variations and am starting to think I will need an extra request to count the columns after the update. Is there a way to achieve what I am trying to do in a single SQL query?


Solution

  • Your question shows that you already know that DML in CTE is not seen until out of the CTE, which is a base principle clearly specified in the doc (see "snapshot"); which implies there will be no exception to this (consistence preserving) rule.

    I can see two ways to implement your need within the CTE (in addition to the decoupled ways you are considering, and which I'll mention in my third paragraph).

    Things will be eased if you've got a primary key,
    so in what follows let's suppose your id is one.

    WHERE looking up in updated

    If the test in the final count is strictly equivalent to the prior SET (which looks true in the simplified query you gave, with enabled = true first as a SET then in the WHERE),
    you can of course count on your condition OR id IN (SELECT id FROM updated).

    Or, in your case of a simple count (with count(a||b) == count(a) + count(b)), just add counts of just updated rows with past updated rows excluding those counted in the "just updated" ones:

    WITH updated AS (
      UPDATE table1
      SET enabled = true
      WHERE id = 'efgh'
      RETURNING *
    )
    UPDATE table2 t
    SET data = u.data, count = (
        SELECT COUNT(*)
        FROM table1
        WHERE id NOT IN (SELECT id FROM UPDATED)
        AND enabled = true
    ) + (SELECT COUNT(*) FROM updated)
    FROM updated u
    WHERE t.id = 'abcd'
    RETURNING t.count;
    

    (see it in a demo)

    Tables masking

    If the filter is more complex, you'll prefer working on a CTE that (incidentally…) has the same structure and data as table1, except they're updated.

    WITH updated AS (
      UPDATE table1
      SET enabled = CASE WHEN id = 'efgh' THEN true ELSE enabled END
      -- ← No more WHERE here, we update the full table, but ↑ look above the CASE to preserve data where change is unwanted.
      RETURNING *
    )
    UPDATE table2 t
    SET data = u.data, count = (
        SELECT COUNT(*)
        FROM updated -- ← Now every new data is seeable.
        WHERE enabled = true
    )
    FROM updated u
    WHERE t.id = 'abcd'
    RETURNING t.count;
    

    (you could even start with an WITH table1 AS […] to be sure you never refer to the non-updated table1 in the rest of the CTE)

    Of course this updates the full table1 (with triggers, reindexing, and so on), so a preferable way would combine the non-modified part of table1 with the rows of updated:

    WITH updated AS (
      UPDATE table1
      SET enabled = true
      WHERE id = 'efgh'
      RETURNING *
    ),
    -- Here we redefine table1 as the union of unmodified and modified rows:
    table1 AS (
      SELECT * FROM table1 WHERE id NOT IN (SELECT id FROM UPDATED)
      UNION ALL
      SELECT * FROM updated
    )
    UPDATE table2 t
    SET data = u.data, count = (
        SELECT COUNT(*)
        FROM table1 -- ← Now we can use the (updated) table1 transparently.
        WHERE enabled = true
    )
    FROM updated u -- ← Except here where we explicitely want to join on only the updated rows.
    WHERE t.id = 'abcd'
    RETURNING t.count;
    

    (demoed in this fiddle)

    After the CTE

    But those "tables masking" solutions start by making full rows copies CTE (and even full table copies),
    which may not be entirely compensated by PostgreSQL's clever optimizer (directly reading the source table instead of working on its materialized full copy).

    I for one would opt for the "2 queries way", benefiting from all newly updated indexes and letting the optimizer use well-known execution paths;
    even if it introduces a race condition, see the must-read Zegarek's comment, both enlightened and enlighting.