postgresqlsqlitetriggerssql-insertsql-function

How do I create a PostgreSQL TRIGGER to automatically calculate columns after INSERT?


I am trying to populate some calculated columns after each new row INSERT in PostgreSQL, but I can't get it to work.

I have already read these posts (I know these are for SQLite - this question was originally for SQLite but I have changed it to be for PostgreSQL - I will look at SQLite once I get PG version working):

  1. sqlite-trigger-after-insert
  2. sqlite-geopackage-after-insert-trigger-to-calculate-length-in-qgis

and

  1. PostgreSQL CREATE TRIGGER

but I want to know if I have to use a SELECT statement and the FOR EACH ROW statement within the TRIGGER because I only want to calculate values for the row just inserted, not all rows.

For more history, this question follows on from my previous question here:

Here is the table setup:

CREATE TABLE myTable (
    datetime TEXT,
    sys_id INT,
    cputil REAL,
    memfree REAL,
    sessnum INT,
    util_lag REAL,   -- to be calculated AFTER INSERT via TRIGGER
    mem_lag REAL,    -- to be calculated AFTER INSERT via TRIGGER
    util_diff REAL,  -- to be calculated AFTER INSERT via TRIGGER
    mem_diff REAL    -- to be calculated AFTER INSERT via TRIGGER
    util_change TEXT -- to be calculated AFTER INSERT via TRIGGER
);

Now for the TRIGGER setup. I have tried the following SQL TRIGGER syntax, but I cannot get it to work:

CREATE TRIGGER tr_fill_calculated_columns
    AFTER INSERT ON myTable
BEGIN
    UPDATE myTable
        SET util_lag = LAG(cputil) OVER (ORDER BY datetime),
        SET mem_lag = LAG(memfree) OVER (ORDER BY datetime),
        SET util_diff = cputil - util_lag,
        SET mem_diff = memfree - mem_lag,
        SET util_change = CASE
            WHEN util_diff > 0 THEN 'Up'
            WHEN util_diff < 0 THEN 'Down'
    WHERE datetime = NEW.datetime AND sys_id = NEW.sys_id
END;

I have also tried the following (which I think is closer to the way it needs to be done in PostgreSQL):

CREATE OR REPLACE FUNCTION fn_calculate_columns_after_insert
    RETURNS TRIGGER
    LANGUAGE PLPGSQL
    AS $$
BEGIN
    NEW.util_lag := (SELECT LAG(cputil) OVER (ORDER BY datetime) FROM myTable);
    NEW.mem_lag := (SELECT LAG(memfree) OVER (ORDER BY datetime) FROM myTable);
    NEW.util_diff := cputil - util_lag;
    NEW.mem_diff := memfree - mem_lag;
    IF NEW.util_diff > 0 THEN
        NEW.util_change := 'Up';
    IF NEW.util_diff < 0 THEN
        NEW.util_change := 'Down';
    ELSE
        NEW.util_change := '';
    RETURN NEW;
END;
$$

CREATE TRIGGER tr_fill_calculated_columns
    AFTER INSERT ON myTable
    FOR EACH ROW
    EXECUTE FUNCTION fn_calculate_columns_after_insert();

No matter what I try I keep getting SQL error [42601]: ERROR: syntax error at or near ";" or near some other character. This is driving me absolutely nuts. None of the official documentation or example tutorials give an example of exactly what I want to do, which I do not think is hard - update a simple calculated column based upon existing column values in a newly inserted row. Pretty basic in my view.

The full dataset contains hundreds of millions of rows, so performance is important - I only want the TRIGGER to calculate the last row immediately AFTER INSERT, and thereafter never query it again, that is, I only want the TRIGGER to act on the new row being inserted (each time a new row is inserted).

To help clarify exactly what I am looking for in the outcome, here is some sample PRE-TRIGGER data:

datetime,sys_id,cputil,memfree,sessnum
2019/05/03 08:06:14,100,0.57,0.51,47
2019/05/03 08:11:14,100,0.47,0.62,43
2019/05/03 08:16:14,100,0.56,0.57,62
2019/05/03 08:21:14,100,0.57,0.56,50
2019/05/03 08:26:14,100,0.35,0.46,43
2019/05/03 08:31:14,100,0.41,0.58,48
2019/05/03 08:36:14,100,0.57,0.35,58
2019/05/03 08:41:14,100,0.41,0.4,58
2019/05/03 08:46:14,100,0.53,0.35,62
2019/05/03 08:51:14,100,0.51,0.6,45
2019/05/03 08:56:14,100,0.32,0.37,47
2019/05/03 09:01:14,100,0.62,0.59,60
2019/05/03 09:06:14,100,0.66,0.72,57
2019/05/03 09:11:14,100,0.54,0.54,44
2019/05/03 09:16:14,100,0.29,0.4,47
2019/05/03 09:21:14,100,0.43,0.68,66
2019/05/03 09:26:14,100,0.49,0.66,65
2019/05/03 09:31:14,100,0.64,0.55,66
2019/05/03 09:36:14,100,0.42,0.6,42
2019/05/03 09:41:14,100,0.55,0.59,63

and here is a sample of the desired POST-TRIGGER result with calculated columns filled by the TRIGGER/FUNCTION:

datetime,sys_id,cputil,memfree,sessnum,util_lag,mem_lag,util_diff,mem_diff,util_change
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL],[NULL],[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,0.57,0.51,-0.1,0.11,Down
2019/05/03 08:16:14,100,0.56,0.57,62,0.47,0.62,0.0900000000000001,-0.05,Up
2019/05/03 08:21:14,100,0.57,0.56,50,0.56,0.57,0.0099999999999999,-0.0099999999999999,Up
2019/05/03 08:26:14,100,0.35,0.46,43,0.57,0.56,-0.22,-0.1,Down
2019/05/03 08:31:14,100,0.41,0.58,48,0.35,0.46,0.06,0.12,Up
2019/05/03 08:36:14,100,0.57,0.35,58,0.41,0.58,0.16,-0.23,Up
2019/05/03 08:41:14,100,0.41,0.4,58,0.57,0.35,-0.16,0.05,Down
2019/05/03 08:46:14,100,0.53,0.35,62,0.41,0.4,0.12,-0.05,Up
2019/05/03 08:51:14,100,0.51,0.6,45,0.53,0.35,-0.02,0.25,Down
2019/05/03 08:56:14,100,0.32,0.37,47,0.51,0.6,-0.19,-0.23,Down
2019/05/03 09:01:14,100,0.62,0.59,60,0.32,0.37,0.3,0.22,Up
2019/05/03 09:06:14,100,0.66,0.72,57,0.62,0.59,0.04,0.13,Up
2019/05/03 09:11:14,100,0.54,0.54,44,0.66,0.72,-0.12,-0.18,Down
2019/05/03 09:16:14,100,0.29,0.4,47,0.54,0.54,-0.25,-0.14,Down
2019/05/03 09:21:14,100,0.43,0.68,66,0.29,0.4,0.14,0.28,Up
2019/05/03 09:26:14,100,0.49,0.66,65,0.43,0.68,0.06,-0.02,Up
2019/05/03 09:31:14,100,0.64,0.55,66,0.49,0.66,0.15,-0.11,Up
2019/05/03 09:36:14,100,0.42,0.6,42,0.64,0.55,-0.22,0.0499999999999999,Down
2019/05/03 09:41:14,100,0.55,0.59,63,0.42,0.6,0.13,-0.01,Up

Finally, I would like to know how to do this in SQLite too, i.e. what is the correct SQLite syntax to do exactly the same thing?


Solution

    1. In PL/pgSQL, the IF conditional statement requires a subsequent END IF. The error

      SQL error [42601]: ERROR: syntax error at or near ";"`
      

      was misleading because the parser was looking for that, found the nearest END which was the final one. It wanted an END IF, got an END; instead - hence the complaint.

    2. You had missing parentheses in the signature.

    3. The last semicolon in the function body is optional. The one right after it, is not optional.

    4. You treated NEW.x and x interchangeably. The field is only defined within the scope of a query where it's a column in one of the source tables. Outside of it, you need to refer to the pseudo-record NEW and its fields.

    5. You can use PL/pgSQL SELECT..INTO to populate both variables at once (that's slightly different from SQL SELECT..INTO which is a worse CREATE TABLE AS).

    6. Use IF..ELSIF..ELSIF..END IF; to specifically state what you want and when and avoid quietly handled null in ELSE.

    7. This has to happen in a TRIGGER..BEFORE INSERT. Doing this in an AFTER action you're not affecting the rows written into the table.

    8. This doesn't do what you expect it to:

      NEW.util_lag := (SELECT LAG(cputil) OVER (ORDER BY datetime) FROM myTable);
      

      it attempts to select all rows from the table, and in each, show the value from the previous one. It tried to assign the entire, slightly shifted set of rows to the variable.

    9. If you happen to insert an out-of-order row that'd squeeze in between two existing rows, it will get the correct values, but it would invalidate the one that was previously comparing against what is now 2 rows back. You might want to detect and handle this scenario as well as decrease the table and index fillfactor adequately, to always have some room to "slide-in" the out-of-order rows and update the ones that they invalidate.

    10. If you do large, multi-row inserts, a statement-level trigger should perform better - you could only fetch one last row from the table to start it off, then bulk-update the whole batch at once using the window function, the way you intially tried to.

    11. Switch the datetime column to timestamptz type to save space, speed things up, index properly and enable adequate functions and operators:

      ALTER TABLE myTable ALTER COLUMN datetime TYPE timestamptz
        USING (datetime::timestamptz);
      
    12. If you're struggling with timestamptz input format, you can use to_timestamp() with a specific mask: demo

      select to_timestamp('05.01.2010 15:08:44.997',
                          'DD.MM.YYYY HH24:MI:SS.MS');
      
      2010-01-05 15:08:44.997+00

      or you can change your DateStyle setting:

      set datestyle=ISO, DMY;
      select '13.01.2010 15:08:44.997'::timestamptz;
      
      2010-01-13 15:08:44.997+00
    13. Add an index on the datetime column, strap cputil and memfree fields to it as payload, increase its fillfactor from default 90 to 100%, then additionally CLUSTER the table on it:

      CREATE INDEX dtidx ON myTable (datetime) 
        INCLUDE (cputil, memfree)
        WITH (fillfactor=100);
      ALTER TABLE myTable CLUSTER ON dtidx;
      

      That way, the query that's looking for the latest row doesn't need to touch the table and finds the values right in the index. Since you're only ever adding more and more recent rows, you don't need the 10% empty space in the index that's normally there to accommodate updates/deletes: both the table and the index will just keep appending new entries. CLUSTER tells the db the table is meant to be physically written in the order corresponding to the index - in your case that happens naturally, but declaring that explicitly might help some execution plans.

    14. After some performance tests: INCLUDE might be unnecessary after all. It's doing an index-only scan, but since the table's being written to, it has to do a heap fetch to make sure the index payload matches the actual values in the row. So you might as well let it do a regular index scan by removing the payload, which might get you some more performance thanks to decreasing the overall index size, similar to how fillfactor did it.

    15. If you really only insert more recent rows, you can remove WHERE datetime < NEW.datetime - that way you'll just peek the most recent row from the table, removing the extra check and speeding things up a bit further. This breaks out-of-order inserts.

    Demo at db<>fiddle:

    CREATE OR REPLACE FUNCTION fn_calculate_columns_after_insert()--missing parentheses here
        RETURNS TRIGGER LANGUAGE PLPGSQL AS $f$
    BEGIN
        SELECT     cputil, memfree 
        INTO NEW.util_lag, NEW.mem_lag
        FROM myTable 
        --this condition isn't necessary if you always insert more recent data
        --WHERE datetime < NEW.datetime 
        ORDER BY datetime DESC
        LIMIT 1;
    
        NEW.util_diff := NEW.cputil  - NEW.util_lag;
        NEW.mem_diff  := NEW.memfree - NEW.mem_lag;
        --the three cases explicitly say what you want and when
        --ELSE no longer assumes the same case for 0 and null
        --null is possible if there were any nulls involved earlier
        IF NEW.util_diff > 0 THEN
            NEW.util_change := 'Up';
        ELSIF NEW.util_diff < 0 THEN
            NEW.util_change := 'Down';
        ELSIF NEW.util_diff = 0 THEN
            NEW.util_change := '';
        END IF;--this was missing
        RETURN NEW;
    END;--This semicolon is optional because it's immediately followed by closing dollar qoutes.
    $f$;--This semicolon was missing. It is not optional, because it ends the whole `create function` statement
    
    CREATE TRIGGER tr_fill_calculated_columns
        BEFORE INSERT ON myTable
        FOR EACH ROW
        EXECUTE FUNCTION fn_calculate_columns_after_insert();
    

    Here's what it comes up with if you try to insert your sample batch:

    datetime sys_id cputil memfree sessnum util_lag mem_lag util_diff mem_diff util_change
    2019/05/03 08:06:14 100 0.57 0.51 47 null null null null null
    2019/05/03 08:11:14 100 0.47 0.62 43 0.57 0.51 -0.099999994 0.110000014 Down
    2019/05/03 08:16:14 100 0.56 0.57 62 0.47 0.62 0.09 -0.050000012 Up
    2019/05/03 08:21:14 100 0.57 0.56 50 0.56 0.57 0.00999999 -0.00999999 Up
    2019/05/03 08:26:14 100 0.35 0.46 43 0.57 0.56 -0.22 -0.099999994 Down

    Performance test on 220k rows:

    No index index index..include index..include..fillfactor index..fillfactor removed where
    >30s
    (doesn't finish)
    9.7s 8.4s 8.5s 8.1s 7.4s