sqlsnowflake-cloud-data-platform

Rolling avg with dynamic preceding rows Snowflake SQL


This is a sample of my data and what I want to calculate is in the rightmost column using the other columns:

DATE Item Brand Cycle_length Brand_Avg_Cycle_Length_3_Weeks
13/09/2023 123 Apple 6
13/09/2023 500 Apple 5
20/09/2023 123 Apple 6
20/09/2023 500 Apple 5
27/09/2023 123 Apple 6 5.333333
27/09/2023 500 Apple 4 5.333333
04/10/2023 123 Apple 6 5.167777
04/10/2023 500 Apple 4 5.167777
13/09/2023 325 Samsung 7
13/09/2023 862 Samsung 3
13/09/2023 455 Samsung 5
20/09/2023 325 Samsung 7
20/09/2023 862 Samsung 3
27/09/2023 455 Samsung 5
04/10/2023 325 Samsung 7 5.333333
27/09/2023 862 Samsung 4 5.333333
04/10/2023 455 Samsung 7 5.333333
11/10/2023 325 Samsung 7 5.666667
04/10/2023 862 Samsung 4 5.666667
11/10/2023 455 Samsung 7 5.666667

The rightmost column is taking all the items in a brand and looking back 3 weeks including current (so 3 rows for each item) and calculating the average of the cycle length column. This is what I'm unable to calculate in Snowflake.

The bold values in the Cycle_length column are being used to calculate the bold values in the Brand_avg_cycle_length_3_weeks column. This is calculated for each item separately.

I've tried a number of things including partition by (with preceding) function but no luck. The blocker is that preceding only looks back a certain number of rows whereas I want to look back: number of weeks multiplied by number of items in brand rows back.

SELECT report_date
,item_cd
,recent_cycle_length
,brand_name
,max(report_date) as latestdate
,avg(recent_cycle_length) over 
    (partition by brand_name 
     order by report_date 
     rows between 11 preceding and current row) AS BRAND_AVG_CYCLE_LENGTH_LAST_12_WEEKS
FROM table
group by recent_cycle_length, brand_name, report_Date,item_cd
;

Solution

  • Because of Snowflake doesn't support a RANGE BETWEEN INTERVAL, the solution uses the trick from the KB "How to rewrite a RANGE BETWEEN INTERVAL query in Snowflake".

    drop table if exists "table";
    
    create temporary table "table" (
      report_date date,
      Item number(5),
      Brand_name varchar(30),
      Cycle_length number(3)
    );
    
    insert into "table" values
      ('2023-09-13', 123, 'Apple', 6),
      ('2023-09-13', 500, 'Apple', 5),
      ('2023-09-20', 123, 'Apple', 6),
      ('2023-09-20', 500, 'Apple', 5),
      ('2023-09-27', 123, 'Apple', 6),
      ('2023-09-27', 500, 'Apple', 4),
      ('2023-10-04', 123, 'Apple', 6),
      ('2023-10-04', 500, 'Apple', 4),
      ('2023-09-13', 325, 'Samsung', 7),
      ('2023-09-13', 862, 'Samsung', 3),
      ('2023-09-13', 455, 'Samsung', 5),
      ('2023-09-20', 325, 'Samsung', 7),
      ('2023-09-20', 862, 'Samsung', 3),
      ('2023-09-27', 455, 'Samsung', 5),
      ('2023-10-04', 325, 'Samsung', 7),
      ('2023-09-27', 862, 'Samsung', 4),
      ('2023-10-04', 455, 'Samsung', 7),
      ('2023-10-11', 325, 'Samsung', 7),
      ('2023-10-04', 862, 'Samsung', 4),
      ('2023-10-11', 455, 'Samsung', 7);
    
    select
      a.*,
      case datediff(week, min(b.report_date), max(b.report_date))
        when 2 then avg(b.Cycle_length)
      end as "3_weeks_avg"
    from "table" as a
    join "table" as b
      on b.Brand_name = a.Brand_name and
         b.report_date between a.report_date - interval '2 weeks' and
                               a.report_date
    group by a.report_date, a.Item, a.Brand_name, a.Cycle_length
    order by a.Brand_name, a.report_date;
    

    Result:

    +-------------+------+------------+--------------+-------------+
    | REPORT_DATE | ITEM | BRAND_NAME | CYCLE_LENGTH | 3_weeks_avg |
    +-------------+------+------------+--------------+-------------+
    | 2023-09-13  |  500 | Apple      |            5 |             |
    | 2023-09-13  |  123 | Apple      |            6 |             |
    | 2023-09-20  |  123 | Apple      |            6 |             |
    | 2023-09-20  |  500 | Apple      |            5 |             |
    | 2023-09-27  |  123 | Apple      |            6 | 5.333333    |
    | 2023-09-27  |  500 | Apple      |            4 | 5.333333    |
    | 2023-10-04  |  123 | Apple      |            6 | 5.166667    |
    | 2023-10-04  |  500 | Apple      |            4 | 5.166667    |
    | 2023-09-13  |  862 | Samsung    |            3 |             |
    | 2023-09-13  |  455 | Samsung    |            5 |             |
    | 2023-09-13  |  325 | Samsung    |            7 |             |
    | 2023-09-20  |  325 | Samsung    |            7 |             |
    | 2023-09-20  |  862 | Samsung    |            3 |             |
    | 2023-09-27  |  455 | Samsung    |            5 | 4.857143    |
    | 2023-09-27  |  862 | Samsung    |            4 | 4.857143    |
    | 2023-10-04  |  325 | Samsung    |            7 | 5.285714    |
    | 2023-10-04  |  455 | Samsung    |            7 | 5.285714    |
    | 2023-10-04  |  862 | Samsung    |            4 | 5.285714    |
    | 2023-10-11  |  325 | Samsung    |            7 | 5.857143    |
    | 2023-10-11  |  455 | Samsung    |            7 | 5.857143    |
    +-------------+------+------------+--------------+-------------+