I want to get the previous value using lag()
, over a partition defined with RANGE BETWEEN
. I followed an example from the documentation:
WITH orders (custkey, orderdate, totalprice)
AS
(
VALUES
('cust_1', DATE '2020-10-10', 100),
('cust_2', DATE '2020-10-10', 15),
('cust_1', DATE '2020-10-15', 200),
('cust_1', DATE '2020-10-16', 240),
('cust_2', DATE '2020-12-20', 25),
('cust_1', DATE '2020-12-25', 140),
('cust_2', DATE '2021-01-01', 5)
)
SELECT *,
avg(totalprice) OVER (
PARTITION BY custkey
ORDER BY orderdate
RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS past_month_avg,
lag(totalprice) OVER ( PARTITION BY custkey
ORDER BY orderdate
RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS previous_total_price_within_last_month
FROM orders
It returns:
custkey | orderdate | totalprice | past_month_avg | previous_total_price_within_last_month |
---|---|---|---|---|
cust_2 | 2020-10-10 | 15 | 15.0 | NULL |
cust_2 | 2020-12-20 | 25 | 25.0 | 15 |
cust_2 | 2021-01-01 | 5 | 15.0 | 25 |
cust_1 | 2020-10-10 | 100 | 100.0 | NULL |
cust_1 | 2020-10-15 | 200 | 150.0 | 100 |
cust_1 | 2020-10-16 | 240 | 180.0 | 200 |
cust_1 | 2020-12-25 | 140 | 140.0 | 240 |
The problem: while the calculation for past_month_avg
results as expected, the previous_total_price_within_last_month
result is not as expected.
Expected Output
Given that I defined a window that ranges at the last month, I expect that lag()
will return null
if the "previous" totalprice
value is associated with an orderdate
value that is out of the 1-month-back window.
custkey | orderdate | totalprice | past_month_avg | previous_total_price_within_last_month |
---|---|---|---|---|
cust_2 | 2020-10-10 | 15 | 15.0 | NULL |
cust_2 | 2020-12-20 | 25 | 25.0 | NULL* |
cust_2 | 2021-01-01 | 5 | 15.0 | 25 |
cust_1 | 2020-10-10 | 100 | 100.0 | NULL |
cust_1 | 2020-10-15 | 200 | 150.0 | 100 |
cust_1 | 2020-10-16 | 240 | 180.0 | 200 |
cust_1 | 2020-12-25 | 140 | 140.0 | NULL* |
*expected value
As written in the comment, lag
isn't expected to be supported when using a window frame (e.g., range between
).
The following functions are not supported either:
lead
, ntile
, rank
, dense_rank
, percent_rank
, cume_dist
, and row_number
.
Please refer to this issue in the Trino project, and to the subsequent pull request.