I have a table of numbers in a Postgres database, and I'm trying to calculate the range (max-min) of the 3 preceding and 3 following rows, for each row. Given a test table:
CREATE TABLE test_table AS (
SELECT
generate_series(1,100) id,
random()*generate_series(1,100) val
);
This can be done with the following query:
SELECT
id,
MAX(val) OVER (
ORDER BY id
ROWS
BETWEEN 3 PRECEDING
AND 3 FOLLOWING
) -
MIN(val) OVER (
ORDER BY id
ROWS
BETWEEN 3 PRECEDING
AND 3 FOLLOWING
) val_range
FROM test_table;
But the OVER
clause is duplicated. Is there a way to improve the query and remove the duplicate? Would the duplicate impact performance? How should I handle the moving window if I wanted to calculate other aggregates for each id?
I've come up with the following query to reuse the window, but I'm not sure if it's the optimal approach.
WITH
windowed AS (
SELECT
id,
UNNEST(ARRAY_AGG(val) OVER (
ORDER BY id
ROWS
BETWEEN 3 PRECEDING
AND 3 FOLLOWING
)) val
FROM test_table
)
SELECT
id,
MAX(val)-MIN(val) val_range,
AVG(val) val_mean,
STDDEV(val) val_stddev
FROM windowed
GROUP BY id
ORDER BY id;
You can use a WINDOW
clause to shorten your code:
SELECT id
, MAX(val) OVER w
- MIN(val) OVER w AS val_range
FROM test_table
WINDOW w AS (ORDER BY id ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);
But that hardly affects performance at all. Postgres will reuse the window frame if you spell it out repeatedly as well. The manual:
When multiple window functions are used, all the window functions having syntactically equivalent
PARTITION BY
andORDER BY
clauses in their window definitions are guaranteed to be evaluated in a single pass over the data.
Related: