Using PostgreSQL 16.2, I have two tables:
task_pool
table
create table task_pool (
task_id serial primary key
, order_id integer
, user_id integer
, expire_time timestamp
, status integer default 0
);
Sample data:
task_id | order_id | user_id | expire_time | status |
---|---|---|---|---|
50 | 80 | 1 | 2024-03-00 11:52:44 | -1(expired) |
51 | 81 | 1 | 2024-03-00 16:00:44 | -1 (expired) |
52 | 80 | 2 | 2024-04-19 12:31:22 | 0 (active) |
53 | 81 | 4 | 2024-03-00 16:52:44 | -1(expired) |
54 | 81 | 6 | 2024-03-00 16:51:44 | -1(expired) |
reactivate_task
table this table is actually a temporary table using WITH AS in a longer query, but for simplicity here is code to create the table:
create table reactivate_task(
, order_id integer
, successfully_added
, total_amount_requested
, amount_to_reactivate integer
);
Sample data:
order_id | successfully_added | total_amount_requested | amount_to_reactivate |
---|---|---|---|
80 | 0 | 3 | 3 |
81 | 0 | 3 | 2 |
What I'm trying to do is, for each row in reactivate_task
, update the x
oldest expired tasks with the related order_id
, and x
being that row's amount_to_reactivate
So in the example tables above, looking at reactivate_task
table, the 3 oldest expired tasks with order_id 80 should be "reactivated". And the 2 oldest expired_tasks with order_id 81 should be "reactivated".
There are the intended results in table task_pool
:
task_id | order_id | user_id | expire_time | status |
---|---|---|---|---|
50 | 80 | 1 | 2024-03-21 12:00:00 | 0(active) --status set to 0 and expire_time to current_timestamp |
51 | 81 | 1 | 2024-03-21 12:00:00 | 0(active) ----status set to 0 and expire_time to current_timestamp |
52 | 80 | 2 | 2024-04-19 12:31:22 | 0 (active) --NOT updated (even though amount_to_reactive was 3 for order_id 80, there was only ONE expired task (task_id 50) that was expired. |
53 | 81 | 4 | 2024-03-19 16:52:44 | -1(expired) --NOT updated (amount_to_reactive was only 2 for order 81, so only the 2 rows with oldest expire_time were updated (task_id 50 & 54) |
54 | 81 | 6 | 2024-03-21 12:00:00 | 0 (active) ----status set to 0 and expire_time to current_timestamp |
This is an example query for ONE row:
UPDATE task_pool
SET status=0,
expire_time = current_timestamp + (2 * interval '1 day')
WHERE task_id IN (
SELECT task_id
FROM task_pool
WHERE order_id = *--order_id from CURRENT ROW OF REACTIVATE_TASK--*
AND status=-1
ORDER BY expire_time
LIMIT *--amount_to_reactivate from current row of reactivate_task--*
But with this query I have to iterate over each row in reactivate_task
outside of Postgres (I'm using node and node-postgres, using a for loop on the results.rows[]
)
Is there a way to do this in Postgres in one query, instead of iterating over each row?
UPDATE task_pool t
SET status = 0
, expire_time = current_timestamp + (interval '2 days')
FROM reactivate_task r
CROSS JOIN LATERAL (
SELECT t1.task_id
FROM task_pool t1
WHERE t1.order_id = r.order_id
AND t1.status = -1
ORDER BY t1.expire_time
LIMIT r.amount_to_reactivate
) t1
WHERE t.task_id = t1.task_id;
Basics for UPDATE
syntax in the manual.
Identify the maximum number of candidate rows in a LATERAL
subquery. See:
Additional relations in the FROM
clause to an update cannot join laterally to the target table, so you have to use a second instance of the same table in this special situation. Related:
CROSS JOIN
eliminates rows from reactivate_task
where no qualifying rows to activate can be found.
If amount_to_reactivate
is null (which it probably should never be), there is effectively no limit.
If (order_id, expire_time)
of task_pool
is not UNIQUE
, ties will be broken arbitrarily, unless you add more expressions to the ORDER BY
to make it deterministic.
If there can be concurrent, competing writes on the table task_pool
you may have to add a locking clause. But not in the same subquery, you'd add another subquery level above the one with the LIMIT
. Related: