sqlteradata

For duplicate rows, creating a column that shows status based on date


I am hoping for some help with the scenario below in SQL

Current State:

enter image description here

For a particular combination of med_id, casemgr_id, casemgr_clnt_id, the status should be reflected accordingly:

Desired Output:

enter image description here

In the first scenario, for med_id=98410, casemgr_id=12345, casemgr_clnt_id=67891, the date (1strow) is valid, but for the same rows, if the dates fall within 6 months, then the status should be invalid. In the second scenario, for med_id=91956, casemgr_id=99012, casemgr_clnt_id=87567, the date (1strow) is valid, but for the second row, the status is valid since the date is greater than 6 months. In summary, for duplicate rows for med_id, casemgr_id and casemgr_clntid, the status for the first row will be valid, but for the subsequent row, it will be valid id the date is > 6 months, otherwise it will be invalid if less than 6 months.

How do I achieve this in SQL? Please assist!


Solution

  • You can't without window functions, as @Jan Suchanek already showed.

    Mine is similar - it also uses LAG(), but it uses some sort of "anchor" using FIRST_VALUE() , to use MONTHS_BETWEEN() upon:

    WITH
    -- your input ...
    indata(id1,med_id,date,casemgr_id,casemgr_cln,status) AS (
              SELECT 123456,98410,DATE '4/19/24',12345,67891,-2
    UNION ALL SELECT 789101,98410,DATE '4/24/24',122345,67891,-2
    UNION ALL SELECT 234561,98410,DATE '4/25/24',12345,67891,-2
    UNION ALL SELECT 567890,98410,DATE '4/26/24',12345,67891,-2
    UNION ALL SELECT 456789,91956,DATE '4/20/24',99012,87567,-2
    UNION ALL SELECT 998415,91956,DATE '12/20/24',99012,87567,-2
    )
    -- end of input, real query starts here ...
    SELECT
      id1
    , med_id
    , date
    , casemgr_id
    , casemgr_cln
    , CASE
        WHEN MONTHS_BETWEEN(
               date
             , FIRST_VALUE(date) OVER(PARTITION BY med_id, casemgr_id, casemgr_cln ORDER BY date)
             )  > 6
          OR MONTHS_BETWEEN(
               date
             , FIRST_VALUE(date) OVER(PARTITION BY med_id, casemgr_id, casemgr_cln ORDER BY date)
             )  = 0
        THEN 'valid'
        ELSE 'invalid'
      END AS status
    FROM indata
    ORDER BY med_id, casemgr_id, casemgr_cln, date 
    ;
    
    id1 med_id date casemgr_id casemgr_cln status
    456,789 91,956 2024-04-20 99,012 87,567 valid
    998,415 91,956 2024-12-20 99,012 87,567 valid
    123,456 98,410 2024-04-19 12,345 67,891 valid
    234,561 98,410 2024-04-25 12,345 67,891 invalid
    567,890 98,410 2024-04-26 12,345 67,891 invalid
    789,101 98,410 2024-04-24 122,345 67,891 valid