sqlpostgresqlsql-updatelateral-join

For each row in table A, update A.amount related rows in table B


Using PostgreSQL 16.2, I have two tables:

task_pool table

create table task_pool (
  task_id serial primary key
, order_id integer
, user_id integer
, expire_time timestamp
, status integer default 0
);

Sample data:

task_id order_id user_id expire_time status
50 80 1 2024-03-00 11:52:44 -1(expired)
51 81 1 2024-03-00 16:00:44 -1 (expired)
52 80 2 2024-04-19 12:31:22 0 (active)
53 81 4 2024-03-00 16:52:44 -1(expired)
54 81 6 2024-03-00 16:51:44 -1(expired)

reactivate_task table this table is actually a temporary table using WITH AS in a longer query, but for simplicity here is code to create the table:

create table reactivate_task(
, order_id integer
, successfully_added
, total_amount_requested
, amount_to_reactivate integer
);

Sample data:

order_id successfully_added total_amount_requested amount_to_reactivate
80 0 3 3
81 0 3 2

What I'm trying to do is, for each row in reactivate_task, update the x oldest expired tasks with the related order_id, and x being that row's amount_to_reactivate

So in the example tables above, looking at reactivate_task table, the 3 oldest expired tasks with order_id 80 should be "reactivated". And the 2 oldest expired_tasks with order_id 81 should be "reactivated".

There are the intended results in table task_pool:

task_id order_id user_id expire_time status
50 80 1 2024-03-21 12:00:00 0(active) --status set to 0 and expire_time to current_timestamp
51 81 1 2024-03-21 12:00:00 0(active) ----status set to 0 and expire_time to current_timestamp
52 80 2 2024-04-19 12:31:22 0 (active) --NOT updated (even though amount_to_reactive was 3 for order_id 80, there was only ONE expired task (task_id 50) that was expired.
53 81 4 2024-03-19 16:52:44 -1(expired) --NOT updated (amount_to_reactive was only 2 for order 81, so only the 2 rows with oldest expire_time were updated (task_id 50 & 54)
54 81 6 2024-03-21 12:00:00 0 (active) ----status set to 0 and expire_time to current_timestamp

This is an example query for ONE row:

UPDATE task_pool 
SET status=0, 
    expire_time = current_timestamp + (2 * interval '1 day')
WHERE task_id IN (
    SELECT task_id
    FROM task_pool
    WHERE order_id = *--order_id from CURRENT ROW OF REACTIVATE_TASK--*
          AND status=-1 
    ORDER BY expire_time
    LIMIT *--amount_to_reactivate from current row of reactivate_task--*

But with this query I have to iterate over each row in reactivate_task outside of Postgres (I'm using node and node-postgres, using a for loop on the results.rows[])

Is there a way to do this in Postgres in one query, instead of iterating over each row?


Solution

  • UPDATE task_pool t
    SET    status = 0
         , expire_time = current_timestamp + (interval '2 days')
    FROM   reactivate_task r
    CROSS  JOIN LATERAL (
       SELECT t1.task_id
       FROM   task_pool t1
       WHERE  t1.order_id = r.order_id
       AND    t1.status = -1
       ORDER  BY t1.expire_time
       LIMIT  r.amount_to_reactivate
       ) t1
    WHERE t.task_id = t1.task_id;
    

    fiddle

    Basics for UPDATE syntax in the manual.

    Identify the maximum number of candidate rows in a LATERAL subquery. See:

    Additional relations in the FROM clause to an update cannot join laterally to the target table, so you have to use a second instance of the same table in this special situation. Related:

    CROSS JOIN eliminates rows from reactivate_task where no qualifying rows to activate can be found.

    If amount_to_reactivate is null (which it probably should never be), there is effectively no limit.

    If (order_id, expire_time) of task_pool is not UNIQUE, ties will be broken arbitrarily, unless you add more expressions to the ORDER BY to make it deterministic.

    If there can be concurrent, competing writes on the table task_pool you may have to add a locking clause. But not in the same subquery, you'd add another subquery level above the one with the LIMIT. Related: