snowflake-cloud-data-platformlag

using Snowflake LAG() on 2 year columns


I have a logic to get the amount from previous years and was trying with LAG() function on 2 columns in a table and derive 2 columns (one for offset 1 and second for offset 2),

I have invoice year, invoice month, vendor id, vendor site, audit year, audit month and amount columns as attached.

create table invoice_audit_data (invoice_year number, invoice_month number, vendor number, site number, audit_year number, audit_month number, amount number);

Insert into INVOICE_AUDIT_DATA (INVOICE_YEAR, INVOICE_MONTH, VENDOR, SITE, AUDIT_YEAR, AUDIT_MONTH, AMOUNT) 
values 
    (2019,11,248,2,2022,1,9162)
    ,(2019,11,248,2,2022,2,9529)
    ,(2019,11,248,2,2022,3,548) 
    ,(2019,11,248,2,2022,4,7796)
    ,(2019,11,248,2,2022,5,4820)
    ,(2019,11,248,2,2022,6,6376)
    ,(2019,11,248,2,2022,7,947)
    ,(2019,11,248,2,2022,8,3930)
    ,(2020,11,248,2,2022,1,9280)
    ,(2020,11,248,2,2022,2,3969)
    ,(2020,11,248,2,2022,3,3156)
    ,(2020,11,248,2,2022,4,7900)
    ,(2020,11,248,2,2022,5,2710)
    ,(2020,11,248,2,2022,6,9959)
    ,(2020,11,248,2,2022,7,2870)
    ,(2020,11,248,2,2022,8,8611)
    ,(2020,11,248,2,2022,9,1614)
    ,(2020,11,248,2,2022,10,7357)
    ,(2020,11,248,2,2022,11,3251)
    ,(2020,11,248,2,2022,12,8215)
    ,(2020,11,248,2,2023,1,7967)
    ,(2020,11,248,2,2023,2,2514)
    ,(2020,11,248,2,2023,3,114)
    ,(2021,11,248,2,2022,1,3446)
    ,(2021,11,248,2,2022,2,6165)
    ,(2021,11,248,2,2022,3,102)
    ,(2021,11,248,2,2022,4,8748)
    ,(2021,11,248,2,2022,5,6918)
    ,(2021,11,248,2,2022,6,6340)
    ,(2021,11,248,2,2022,7,2819)
    ,(2021,11,248,2,2022,8,255)
    ,(2021,11,248,2,2022,9,8121)
    ,(2021,11,248,2,2022,10,9784)
    ,(2021,11,248,2,2022,11,2604)
    ,(2021,11,248,2,2022,12,881)
    ,(2021,11,248,2,2023,1,2482)
    ,(2021,11,248,2,2023,2,9474)
    ,(2021,11,248,2,2023,3,1662)
    ,(2021,11,248,2,2023,4,8422);

I tired using LAG() to go back invoice year and then use another LAG() to go back on audit year, but in the data invoice year 2020 has only 3 months in audit year 2023, so my result is 0.

  1. I want to have write a CTE using this table, calculate the amount from previous invoice year/month and previous audit year/month and display in main select. for example, if my invoice year-mm is 2021-11 and audit year-mm is 2023-4, this column would get the amount from invoice year-mm 2020-11 and audit year-mm 2022-4. So the LAG() will have to be applied on both invoice year and audit year.
  2. I want to add another column to do the same thing to go back 2 years.

Solution

  • The simplest way to solve is via JOIN's thus:

    select a.*
        ,b.audit_year 
        ,b.audit_month
        ,b.amount
    from INVOICE_AUDIT_DATA as a
    left join INVOICE_AUDIT_DATA as b 
        on a.vendor = b.vendor and a.site = b.site
         and a.invoice_year = b.invoice_year and a.invoice_month = b.invoice_month
         and a.audit_year-1 = b.audit_year and a.audit_month = b.audit_month
    order by 1,2,5,6;
    

    enter image description here

    thus 2 years is:

    select a.*
        ,b.audit_year as ay_m1
        --,b.audit_month 
        ,b.amount as ay_m1_amount
        ,c.audit_year as ay_m2
        --,c.audit_month
        ,c.amount as ay_m2_amount
    from INVOICE_AUDIT_DATA as a
    left join INVOICE_AUDIT_DATA as b 
        on a.vendor = b.vendor and a.site = b.site
         and a.invoice_year = b.invoice_year and a.invoice_month = b.invoice_month
         and a.audit_year-1 = b.audit_year and a.audit_month = b.audit_month
    left join INVOICE_AUDIT_DATA as c
        on a.vendor = c.vendor and a.site = c.site
         and a.invoice_year = c.invoice_year and a.invoice_month = c.invoice_month
         and a.audit_year-2 = c.audit_year and a.audit_month = c.audit_month
    order by 1,2,5,6;
    

    which adds no new information, as the data does not cover the time range.

    enter image description here

    Now given your data is strictly dense, as it appears to be (aka for a invoice year/month set, there are no gaps in audit year/month) you can use fixed offset lag, thus:

    select a.*
        ,lag(amount, 12)over(partition by invoice_year, invoice_month, vendor, site order by audit_year, audit_month) as ay_m1_amount
        ,lag(amount, 24)over(partition by invoice_year, invoice_month, vendor, site order by audit_year, audit_month) as ay_m2_amount
    from INVOICE_AUDIT_DATA as a
    order by 1,2,5,6;
    

    gives the same results:

    enter image description here