databasepostgresqlconcurrencylockingplpgsql

Locking write operations on tables while Postgres procedure executes


I have created a procedure in Postgres that will create a table from the contents of three other tables (SELECT with JOINs). Then the resulting table is altered adding two new columns. Finally, 3 triggers are created on those 3 source tables, so any new writes are propagated to the new table.

I know the procedure itself is atomic and transactional from its own scope. But the procedure has no means of knowing anything about the three tables. I'm afraid that in the time between the new table creation and the creation of the triggers, some writes to the existing tables could happen thus de-synchronizing the new table, which won't register those writes. That cannot happen.

My table creation/trigger creation procedure looks like this:

CREATE OR REPLACE PROCEDURE myschema.table_creation()
LANGUAGE plpgsql
AS $procedure$
BEGIN
create table newtable as
SELECT * FROM myschema.session a NATURAL JOIN (SELECT * FROM myschema.message b NATURAL left JOIN myschema.data) as d;
ALTER TABLE myschema.newtable ADD created_at timestamp;
ALTER TABLE myschema.newtable ADD source text;
CREATE TRIGGER mytrigger
after INSERT
ON myschema.session
FOR EACH ROW
EXECUTE PROCEDURE myschema.trigger_proc();
CREATE TRIGGER mytrigger
after INSERT
ON myschema.messages
FOR EACH ROW
EXECUTE PROCEDURE myschema.trigger_proc();
CREATE TRIGGER mytrigger
after INSERT
ON myschema.data
FOR EACH ROW
EXECUTE PROCEDURE myschema.trigger_proc();
END;
$procedure$; 

How can I lock the writes in the existing tables to postpone them until the whole table_creation() procedure has finished? Otherwise, I would have a race condition and rows could be lost in the new table. I don't think my procedure in its current state has any protection against writes.


Solution

  • To be absolutely sure, LOCK the three source tables exclusively first.
    I suggest a SHARE lock. The manual:

    [...] This mode protects a table against concurrent data changes.

    BEGIN;
    
    LOCK TABLE myschema.session, myschema.message, myschema.data IN SHARE MODE;  -- ① !!!
    
    CREATE TABLE newtable AS
    SELECT *
    FROM   myschema.session a
    NATURAL JOIN (
       SELECT *
       FROM   myschema.message b
       NATURAL LEFT JOIN myschema.data
       ) d;
    
    ALTER TABLE myschema.newtable  -- ② single command
      ADD created_at timestamp
    , ADD source text;
    
    CREATE TRIGGER mytrigger
    AFTER INSERT ON myschema.session
    FOR EACH ROW EXECUTE FUNCTION myschema.trigger_proc();  -- ③ it's really a function
    
    CREATE TRIGGER mytrigger
    AFTER INSERT ON myschema.messages
    FOR EACH ROW EXECUTE FUNCTION myschema.trigger_proc();
    
    CREATE TRIGGER mytrigger
    AFTER INSERT ON myschema.data
    FOR EACH ROW EXECUTE FUNCTION myschema.trigger_proc();
    
    COMMIT;
    

    Since this obviously is a one-time operation with no PL/pgSQL-specific operations, I wouldn't create a procedure - or even use a PL/pgSQL code block at all. A plain transaction does the job.

    Concurrent attempts to write to the table will have to wait until your transaction has finished.
    But reads are not hindered at all.

    ① Note that the CREATE TRIGGERcommand acquires the slightly stricter lock SHARE ROW EXCLUSIVE, but we don't strictly need that lock strength earlier. All locks are released at the end of the transaction.

    ② A single ALTER is slightly cheaper, and possible for this.

    ③ Assuming at least Postgres 11. See:

    Accordingly, I wouldn't use the name trigger_proc for the trigger function - though obviously just symbolic here.