sqlsql-serversubquerywindow-functionslateral-join

Calculate rolling average of a measure based on certain condition from another column


Below is the given table.

date id1 id2 fact boolean
2022-01-01 i1 l1 100 0
2022-01-02 i1 l1 10 1
2022-01-03 i1 l1 110 0
2022-01-04 i1 l1 70 0
2022-01-05 i1 l1 20 1
2022-01-05 i1 l1 30 1
2022-01-06 i1 l1 40 0
2022-01-07 i1 l1 50 0
2022-01-08 i1 l1 70 0
2022-01-09 i1 l1 100 1
2022-01-09 i1 l1 10 1
2022-01-10 i1 l1 200 0

Expected outcome for rolling average:

date id1 id2 fact boolean rolling_avg
2022-01-01 i1 l1 100 0
2022-01-02 i1 l1 10 1 100
2022-01-03 i1 l1 110 0
2022-01-04 i1 l1 70 0
2022-01-05 i1 l1 20 1 90
2022-01-05 i1 l1 30 1 90
2022-01-06 i1 l1 40 0
2022-01-07 i1 l1 50 0
2022-01-08 i1 l1 70 0
2022-01-09 i1 l1 100 1 53.33
2022-01-09 i1 l1 10 1 53.33
2022-01-10 i1 l1 200 0

Rolling avg of 'fact' column for preceding 3 days required only for rows where boolean = 1 and while calculating the average consider only those values where boolean is 0.

Example1: for row 5, preceding 3 days are 4, 3 and 2 where boolean is 0 for 4 and 3. Hence (110+70/2) = 90

Example2: for row 9, preceding 3 days are 8, 7 and 6 where all have boolean = 0. Hence (70+50+40/3) = 53.33

I've tried doing

AVG(expression) OVER (PARTITION BY id1, id2 
                      ORDER BY date DESC 
                      ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as rolling_avg

However the result I got didn't match the required outcome.


Solution

  • If you dates may have gaps, then I would recommend a lateral join:

    select t.*, a.*
    from mytable t
    outer apply (
        select avg(t1.fact) rolling_avg
        from mytable t1
        where t.boolean = 1
            and t1.id1 = t.id1 and t1.id2 = t.id2 
            and t1.date >= dateadd(day, -3, t.date) and t1.date < t.date
            and t1.boolean = 0
    ) a
    

    The subquery selects rows that belong to the same ids over the 3 previous days and computes the average fact, with respect to the boolean flag of the current and previous rows.

    If there is always one and only one fact row per date for a given tuple of ids, then window functions with a row frame are relevant (and more efficient):

    select t.*, 
        case when boolean = 1 then
            avg(case when boolean = 0 then fact end) over(
                partition by id1, id2 
                order by date
                rows between 3 preceding and 1 preceding
            ) 
        end rolling_avg
    from mytable t