sqlsql-serverlead

Trouble with sort order using Lead with window function


I'm trying to put a flag on claims that has an Approved followed by a Terminated on the same date. So in the example below ClaimID = 123 has both an Approved and a Terminated on 12/12/2024 and this would have a flag = 1. The Approved has to come first, so in this example the Approved on 12/12 came at 01:19 and the Terminated on 12/12 at 01:20. The same holds true for ClaimID = 333.

The dataset looks as follows:

ClaimID Status CompleteDate
123 Terminated 12/12/2024 01:20
123 Approved 12/12/2024 01:19
123 Approved 12/10/2024 02:00
123 Cancelled 12/23/2024 02:01
234 Cancelled 03/25/2024 03:00
234 Approved 03/24/2024 02:20
234 Terminated 03/25/2024 02:22
333 Approved 04/01/2024 03:00
333 Terminated 04/01/2024 03:02
444 Terminated 04/02/2024 03:00
444 Approved 04/02/2024 03:02

The desired output would look like:

ClaimID Status CompleteDate Flag
123 Approved 12/10/2024 0
123 Approved 12/12/2024 1
123 Terminated 12/12/2024 1
123 Cancelled 12/23/2024 0
234 Approved 03/24/2024 0
234 Terminated 03/25/2024 0
234 Cancelled 03/25/2024 0
333 Approved 04/01/2024 1
333 Terminated 04/01/2024 1
444 Terminated 04/02/2024 0
444 Approved 04/02/2024 0
declare @t table (
    claimid int,
    status varchar(100),
    completedate datetime
);

insert into @t (claimid, status, completedate)
values
(123, 'Terminated', '12/12/2024 01:20'),    
(123, 'Approved', '12/12/2024 01:19' ),
(123, 'Approved', '12/10/2024 02:00'),
(123, 'Cancelled', '12/23/2024 02:01'),    
(234, 'Cancelled', '03/25/2024 03:00'),
(234, 'Approved', '03/24/2024 02:20'),    
(234, 'Terminated', '03/25/2024 02:22'),
(333, 'Approved', '04/01/2024 03:00'),
(333, 'Terminated', '04/01/2024 02:40'),
(444, 'Terminated', '04/02/2024 03:00'),
(444, 'Approved', '04/02/2024 03:02');
  
select *       
from @t;

I've tried a window function using lead but it's not working. I'm thinking it doesn't work because the data isn't sorted properly but I can't figure out how to keep the sort order when doing the function.

My query is:

select *,
    case when (status = 'Approved' and lead(status, 1) over (partition by claimid, completedate 
    order by completedate ASC) = 'Terminated') then 1 else 0 end as Flag
from #test

Solution

  • Rather than using lead use a conditional sum partitioned by claimid and completeddate which checks for either of the statuses and if it finds more than 1 we have a match.

    select *
        , case when sum(case when status = 'Approved' or status = 'Terminated' then 1 else 0 end)
        over (partition by claimid, completedate) > 1 then 1 else 0 end Flag
    from @t;
    

    db<>fiddle

    In order to tighten up the check to ensure that 'Terminated' follows 'Approved' we can use the following solution which for 'Approved' looks ahead within the same date and claim for a 'Terminated' and for 'Terminated' looks behind within the same date and claim for an 'Approved':

    select *
      , case when max(case when status = 'Terminated' then 1 else 0 end)
      over (
        partition by claimid, convert(date, completedate)
        order by completedate desc rows between unbounded preceding and 1 preceding
      ) = 1 or
      max(case when status = 'Approved' then 1 else 0 end)
      over (
        partition by claimid, convert(date, completedate)
        order by completedate asc rows between unbounded preceding and 1 preceding
      ) = 1 then 1 else 0 end Flag
    from @t;
    

    Note: This could still fail is there were multiple 'Approved' and 'Terminated' events for the same claim and date.

    db<>fiddle