Below is the given table.
date | id1 | id2 | fact | boolean |
---|---|---|---|---|
2022-01-01 | i1 | l1 | 100 | 0 |
2022-01-02 | i1 | l1 | 10 | 1 |
2022-01-03 | i1 | l1 | 110 | 0 |
2022-01-04 | i1 | l1 | 70 | 0 |
2022-01-05 | i1 | l1 | 20 | 1 |
2022-01-05 | i1 | l1 | 30 | 1 |
2022-01-06 | i1 | l1 | 40 | 0 |
2022-01-07 | i1 | l1 | 50 | 0 |
2022-01-08 | i1 | l1 | 70 | 0 |
2022-01-09 | i1 | l1 | 100 | 1 |
2022-01-09 | i1 | l1 | 10 | 1 |
2022-01-10 | i1 | l1 | 200 | 0 |
Expected outcome for rolling average:
date | id1 | id2 | fact | boolean | rolling_avg |
---|---|---|---|---|---|
2022-01-01 | i1 | l1 | 100 | 0 | |
2022-01-02 | i1 | l1 | 10 | 1 | 100 |
2022-01-03 | i1 | l1 | 110 | 0 | |
2022-01-04 | i1 | l1 | 70 | 0 | |
2022-01-05 | i1 | l1 | 20 | 1 | 90 |
2022-01-05 | i1 | l1 | 30 | 1 | 90 |
2022-01-06 | i1 | l1 | 40 | 0 | |
2022-01-07 | i1 | l1 | 50 | 0 | |
2022-01-08 | i1 | l1 | 70 | 0 | |
2022-01-09 | i1 | l1 | 100 | 1 | 53.33 |
2022-01-09 | i1 | l1 | 10 | 1 | 53.33 |
2022-01-10 | i1 | l1 | 200 | 0 |
Rolling avg of 'fact' column for preceding 3 days required only for rows where boolean = 1 and while calculating the average consider only those values where boolean is 0.
Example1: for row 5, preceding 3 days are 4, 3 and 2 where boolean is 0 for 4 and 3. Hence (110+70/2) = 90
Example2: for row 9, preceding 3 days are 8, 7 and 6 where all have boolean = 0. Hence (70+50+40/3) = 53.33
I've tried doing
AVG(expression) OVER (PARTITION BY id1, id2
ORDER BY date DESC
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as rolling_avg
However the result I got didn't match the required outcome.
If you dates may have gaps, then I would recommend a lateral join:
select t.*, a.*
from mytable t
outer apply (
select avg(t1.fact) rolling_avg
from mytable t1
where t.boolean = 1
and t1.id1 = t.id1 and t1.id2 = t.id2
and t1.date >= dateadd(day, -3, t.date) and t1.date < t.date
and t1.boolean = 0
) a
The subquery selects rows that belong to the same ids over the 3 previous days and computes the average fact, with respect to the boolean flag of the current and previous rows.
If there is always one and only one fact row per date for a given tuple of ids, then window functions with a row frame are relevant (and more efficient):
select t.*,
case when boolean = 1 then
avg(case when boolean = 0 then fact end) over(
partition by id1, id2
order by date
rows between 3 preceding and 1 preceding
)
end rolling_avg
from mytable t