sqlpostgresqlsqlalchemyrowlocking

Atomic Select & Update. Make row invisible or unselectable by more than one process


The situation:

PostgresSQL database. Application with SQL Alchemy ORM (not really important). Table with millions of rows.

Hundreds of processes access the database with that table. Each wants to select one row and perform a relatively expensive operation based on its content, then populate other tables and update that row.

The naïve approach which I used is like this:

SELECT * FROM table WHERE status = 'free';

and then right after that:

UPDATE table SET status 'in_process';

Now the problem is that those operations are not atomic, meaning that in the time between the SELECT and the UPDATE, up to 5 other processes can select that row and begin working on it (which, I remind you, is pretty expensive).

Now I know there is SELECT FOR UPDATE which locks rows. But it locks them FOR UPDATE (duh), it doesn't forbid rows from being selected.

So I guess it has to be a pretty common issue, but googling didn't helped much.


Solution

  • Seems like this is a way to solve this:

    Using python and sqlalchemy (but it's not nessesary, since i use raw SQL anyway)

    from sqlalchemy import text
    sql = text("UPDATE table 
                SET status = 'in_process' 
                WHERE column.id = (SELECT column.id 
                                   FROM table 
                                   WHERE status='free' 
                                   AND pg_try_advisory_xact_lock(column.id) 
                                   LIMIT 1 FOR UPDATE) 
                RETURNING *"
    row = next(iter(engine.execution_options(autocommit=True).execute(sql)))
    # Now row is a tuple of values