I am trying to populate some calculated columns after each new row INSERT
in PostgreSQL
, but I can't get it to work.
I have already read these posts (I know these are for SQLite
- this question was originally for SQLite
but I have changed it to be for PostgreSQL
- I will look at SQLite
once I get PG
version working):
and
but I want to know if I have to use a SELECT
statement and the FOR EACH ROW
statement within the TRIGGER
because I only want to calculate values for the row just inserted, not all rows.
For more history, this question follows on from my previous question here:
Here is the table setup:
CREATE TABLE myTable (
datetime TEXT,
sys_id INT,
cputil REAL,
memfree REAL,
sessnum INT,
util_lag REAL, -- to be calculated AFTER INSERT via TRIGGER
mem_lag REAL, -- to be calculated AFTER INSERT via TRIGGER
util_diff REAL, -- to be calculated AFTER INSERT via TRIGGER
mem_diff REAL -- to be calculated AFTER INSERT via TRIGGER
util_change TEXT -- to be calculated AFTER INSERT via TRIGGER
);
Now for the TRIGGER
setup.
I have tried the following SQL TRIGGER
syntax, but I cannot get it to work:
CREATE TRIGGER tr_fill_calculated_columns
AFTER INSERT ON myTable
BEGIN
UPDATE myTable
SET util_lag = LAG(cputil) OVER (ORDER BY datetime),
SET mem_lag = LAG(memfree) OVER (ORDER BY datetime),
SET util_diff = cputil - util_lag,
SET mem_diff = memfree - mem_lag,
SET util_change = CASE
WHEN util_diff > 0 THEN 'Up'
WHEN util_diff < 0 THEN 'Down'
WHERE datetime = NEW.datetime AND sys_id = NEW.sys_id
END;
I have also tried the following (which I think is closer to the way it needs to be done in PostgreSQL
):
CREATE OR REPLACE FUNCTION fn_calculate_columns_after_insert
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.util_lag := (SELECT LAG(cputil) OVER (ORDER BY datetime) FROM myTable);
NEW.mem_lag := (SELECT LAG(memfree) OVER (ORDER BY datetime) FROM myTable);
NEW.util_diff := cputil - util_lag;
NEW.mem_diff := memfree - mem_lag;
IF NEW.util_diff > 0 THEN
NEW.util_change := 'Up';
IF NEW.util_diff < 0 THEN
NEW.util_change := 'Down';
ELSE
NEW.util_change := '';
RETURN NEW;
END;
$$
CREATE TRIGGER tr_fill_calculated_columns
AFTER INSERT ON myTable
FOR EACH ROW
EXECUTE FUNCTION fn_calculate_columns_after_insert();
No matter what I try I keep getting SQL error [42601]: ERROR: syntax error at or near ";"
or near some other character. This is driving me absolutely nuts. None of the official documentation or example tutorials give an example of exactly what I want to do, which I do not think is hard - update a simple calculated column based upon existing column values in a newly inserted row. Pretty basic in my view.
The full dataset contains hundreds of millions of rows, so performance is important - I only want the TRIGGER to calculate the last row immediately AFTER INSERT, and thereafter never query it again, that is, I only want the TRIGGER to act on the new row being inserted (each time a new row is inserted).
To help clarify exactly what I am looking for in the outcome, here is some sample PRE-TRIGGER data:
datetime,sys_id,cputil,memfree,sessnum
2019/05/03 08:06:14,100,0.57,0.51,47
2019/05/03 08:11:14,100,0.47,0.62,43
2019/05/03 08:16:14,100,0.56,0.57,62
2019/05/03 08:21:14,100,0.57,0.56,50
2019/05/03 08:26:14,100,0.35,0.46,43
2019/05/03 08:31:14,100,0.41,0.58,48
2019/05/03 08:36:14,100,0.57,0.35,58
2019/05/03 08:41:14,100,0.41,0.4,58
2019/05/03 08:46:14,100,0.53,0.35,62
2019/05/03 08:51:14,100,0.51,0.6,45
2019/05/03 08:56:14,100,0.32,0.37,47
2019/05/03 09:01:14,100,0.62,0.59,60
2019/05/03 09:06:14,100,0.66,0.72,57
2019/05/03 09:11:14,100,0.54,0.54,44
2019/05/03 09:16:14,100,0.29,0.4,47
2019/05/03 09:21:14,100,0.43,0.68,66
2019/05/03 09:26:14,100,0.49,0.66,65
2019/05/03 09:31:14,100,0.64,0.55,66
2019/05/03 09:36:14,100,0.42,0.6,42
2019/05/03 09:41:14,100,0.55,0.59,63
and here is a sample of the desired POST-TRIGGER result with calculated columns filled by the TRIGGER/FUNCTION:
datetime,sys_id,cputil,memfree,sessnum,util_lag,mem_lag,util_diff,mem_diff,util_change
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL],[NULL],[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,0.57,0.51,-0.1,0.11,Down
2019/05/03 08:16:14,100,0.56,0.57,62,0.47,0.62,0.0900000000000001,-0.05,Up
2019/05/03 08:21:14,100,0.57,0.56,50,0.56,0.57,0.0099999999999999,-0.0099999999999999,Up
2019/05/03 08:26:14,100,0.35,0.46,43,0.57,0.56,-0.22,-0.1,Down
2019/05/03 08:31:14,100,0.41,0.58,48,0.35,0.46,0.06,0.12,Up
2019/05/03 08:36:14,100,0.57,0.35,58,0.41,0.58,0.16,-0.23,Up
2019/05/03 08:41:14,100,0.41,0.4,58,0.57,0.35,-0.16,0.05,Down
2019/05/03 08:46:14,100,0.53,0.35,62,0.41,0.4,0.12,-0.05,Up
2019/05/03 08:51:14,100,0.51,0.6,45,0.53,0.35,-0.02,0.25,Down
2019/05/03 08:56:14,100,0.32,0.37,47,0.51,0.6,-0.19,-0.23,Down
2019/05/03 09:01:14,100,0.62,0.59,60,0.32,0.37,0.3,0.22,Up
2019/05/03 09:06:14,100,0.66,0.72,57,0.62,0.59,0.04,0.13,Up
2019/05/03 09:11:14,100,0.54,0.54,44,0.66,0.72,-0.12,-0.18,Down
2019/05/03 09:16:14,100,0.29,0.4,47,0.54,0.54,-0.25,-0.14,Down
2019/05/03 09:21:14,100,0.43,0.68,66,0.29,0.4,0.14,0.28,Up
2019/05/03 09:26:14,100,0.49,0.66,65,0.43,0.68,0.06,-0.02,Up
2019/05/03 09:31:14,100,0.64,0.55,66,0.49,0.66,0.15,-0.11,Up
2019/05/03 09:36:14,100,0.42,0.6,42,0.64,0.55,-0.22,0.0499999999999999,Down
2019/05/03 09:41:14,100,0.55,0.59,63,0.42,0.6,0.13,-0.01,Up
Finally, I would like to know how to do this in SQLite too, i.e. what is the correct SQLite syntax to do exactly the same thing?
In PL/pgSQL, the IF
conditional statement requires a subsequent END IF
. The error
SQL error [42601]: ERROR: syntax error at or near ";"`
was misleading because the parser was looking for that, found the nearest END
which was the final one. It wanted an END IF
, got an END;
instead - hence the complaint.
You had missing parentheses in the signature.
The last semicolon in the function body is optional. The one right after it, is not optional.
You treated NEW.x
and x
interchangeably. The field is only defined within the scope of a query where it's a column in one of the source tables. Outside of it, you need to refer to the pseudo-record NEW
and its fields.
You can use PL/pgSQL SELECT..INTO
to populate both variables at once (that's slightly different from SQL SELECT..INTO
which is a worse CREATE TABLE AS
).
Use IF..ELSIF..ELSIF..END IF;
to specifically state what you want and when and avoid quietly handled null
in ELSE
.
This has to happen in a TRIGGER..
BEFORE
INSERT
. Doing this in an AFTER
action you're not affecting the rows written into the table.
This doesn't do what you expect it to:
NEW.util_lag := (SELECT LAG(cputil) OVER (ORDER BY datetime) FROM myTable);
it attempts to select all rows from the table, and in each, show the value from the previous one. It tried to assign the entire, slightly shifted set of rows to the variable.
If you happen to insert an out-of-order row that'd squeeze in between two existing rows, it will get the correct values, but it would invalidate the one that was previously comparing against what is now 2 rows back. You might want to detect and handle this scenario as well as decrease the table and index fillfactor
adequately, to always have some room to "slide-in" the out-of-order rows and update the ones that they invalidate.
If you do large, multi-row inserts, a statement-level trigger should perform better - you could only fetch one last row from the table to start it off, then bulk-update the whole batch at once using the window function, the way you intially tried to.
Switch the datetime
column to timestamptz
type to save space, speed things up, index properly and enable adequate functions and operators:
ALTER TABLE myTable ALTER COLUMN datetime TYPE timestamptz
USING (datetime::timestamptz);
If you're struggling with timestamptz
input format, you can use to_timestamp()
with a specific mask: demo
select to_timestamp('05.01.2010 15:08:44.997',
'DD.MM.YYYY HH24:MI:SS.MS');
2010-01-05 15:08:44.997+00 |
or you can change your DateStyle
setting:
set datestyle=ISO, DMY;
select '13.01.2010 15:08:44.997'::timestamptz;
2010-01-13 15:08:44.997+00 |
Add an index on the datetime
column, strap cputil
and memfree
fields to it as payload, increase its fillfactor
from default 90 to 100%, then additionally CLUSTER
the table on it:
CREATE INDEX dtidx ON myTable (datetime)
INCLUDE (cputil, memfree)
WITH (fillfactor=100);
ALTER TABLE myTable CLUSTER ON dtidx;
That way, the query that's looking for the latest row doesn't need to touch the table and finds the values right in the index. Since you're only ever adding more and more recent rows, you don't need the 10% empty space in the index that's normally there to accommodate updates/deletes: both the table and the index will just keep appending new entries. CLUSTER
tells the db the table is meant to be physically written in the order corresponding to the index - in your case that happens naturally, but declaring that explicitly might help some execution plans.
After some performance tests: INCLUDE
might be unnecessary after all. It's doing an index-only scan, but since the table's being written to, it has to do a heap fetch to make sure the index payload matches the actual values in the row. So you might as well let it do a regular index scan by removing the payload, which might get you some more performance thanks to decreasing the overall index size, similar to how fillfactor
did it.
If you really only insert more recent rows, you can remove WHERE datetime < NEW.datetime
- that way you'll just peek the most recent row from the table, removing the extra check and speeding things up a bit further. This breaks out-of-order inserts.
CREATE OR REPLACE FUNCTION fn_calculate_columns_after_insert()--missing parentheses here
RETURNS TRIGGER LANGUAGE PLPGSQL AS $f$
BEGIN
SELECT cputil, memfree
INTO NEW.util_lag, NEW.mem_lag
FROM myTable
--this condition isn't necessary if you always insert more recent data
--WHERE datetime < NEW.datetime
ORDER BY datetime DESC
LIMIT 1;
NEW.util_diff := NEW.cputil - NEW.util_lag;
NEW.mem_diff := NEW.memfree - NEW.mem_lag;
--the three cases explicitly say what you want and when
--ELSE no longer assumes the same case for 0 and null
--null is possible if there were any nulls involved earlier
IF NEW.util_diff > 0 THEN
NEW.util_change := 'Up';
ELSIF NEW.util_diff < 0 THEN
NEW.util_change := 'Down';
ELSIF NEW.util_diff = 0 THEN
NEW.util_change := '';
END IF;--this was missing
RETURN NEW;
END;--This semicolon is optional because it's immediately followed by closing dollar qoutes.
$f$;--This semicolon was missing. It is not optional, because it ends the whole `create function` statement
CREATE TRIGGER tr_fill_calculated_columns
BEFORE INSERT ON myTable
FOR EACH ROW
EXECUTE FUNCTION fn_calculate_columns_after_insert();
Here's what it comes up with if you try to insert your sample batch:
datetime | sys_id | cputil | memfree | sessnum | util_lag | mem_lag | util_diff | mem_diff | util_change |
---|---|---|---|---|---|---|---|---|---|
2019/05/03 08:06:14 | 100 | 0.57 | 0.51 | 47 | null | null | null | null | null |
2019/05/03 08:11:14 | 100 | 0.47 | 0.62 | 43 | 0.57 | 0.51 | -0.099999994 | 0.110000014 | Down |
2019/05/03 08:16:14 | 100 | 0.56 | 0.57 | 62 | 0.47 | 0.62 | 0.09 | -0.050000012 | Up |
2019/05/03 08:21:14 | 100 | 0.57 | 0.56 | 50 | 0.56 | 0.57 | 0.00999999 | -0.00999999 | Up |
2019/05/03 08:26:14 | 100 | 0.35 | 0.46 | 43 | 0.57 | 0.56 | -0.22 | -0.099999994 | Down |
Performance test on 220k rows: