How can I write a CTE in SQL Server to get the End_Date
for a given partition of rows, where the partition is by Person & Type, with a new partition starting if the Type changes within a person. End_Date
is defined as the Start_Date
of the next Type partition within a Person partition. The last partition within a Person will always have a null End_Date
.
I previously asked a similar question, which gained an answer, but I had neglected to show the case where the Type might no only change multiple times within a Person, but might also repeat the same Type again. This scenario should also create a new partition as shown in the data below.
For the following source data, records should be sorted by dt_eff
for the person.
Source Data
Person Type dt_eff
123 ABC 2018-10-23
123 DEF 2018-12-19
124 ABC 2020-01-01
124 ABC 2020-02-15
124 ABC 2020-05-14
124 DEF 2020-10-13 < Note change of Type to DEF
124 ABC 2021-01-15 < Note change of Type back to ABC
Expected Output
Person Type Start_Date End_Date Partition
123 ABC 2018-10-23 2018-12-19 123:1 < Start_Date from 123:DEF
123 DEF 2018-12-19 NULL 123:2
124 ABC 2020-01-01 2020-10-13 124:1
124 ABC 2020-02-15 2020-10-13 124:1
124 ABC 2020-05-14 2020-10-13 124:1 < Start_Date from 124:DEF
124 DEF 2020-10-13 2021-01-15 124:2< Start_Date from next 124:ABC
124 ABC 2021-01-15 NULL 124:3
Note the Partition
column is not a required output, its just added for clarify
The key takeaway here is that you can't always produce the results you want in a single query... Sometimes you have to take multiple steps. The following approach is the one I use:
The following produces the results you desire... how is works is explained in the comments.
with cte1 as (
select *
-- 1. Find the dt_eff value of the next row, we need this to get the end date
, lead(dt_eff) over (partition by Person order by dt_eff) dt_eff_lead
-- Find the Type value of the previous row, we need this to detect a change in type
, lag(Type, 1, Type) over (partition by Person order by dt_eff) type_lag
from Person
), cte2 as (
select Person, Type, dt_eff Start_Date
, dt_eff_lead
-- Count the number of Type transitions by comparing this row with the previous row
-- prior to and including the current row
-- Each time the count changes gives us a new partition to use in the final result
, sum(case when Type <> type_lag then 1 else 0 end)
over (partition by person order by dt_eff asc
rows between unbounded preceding and current row) TypeGroup
from cte1
)
select Person, Type, Start_Date
-- Get the maximum dt_eff_lead for a given type partition within a person partition
, max(dt_eff_lead) over (partition by Person, TypeGroup) End_Date
from cte2
order by Person, Start_Date, Type;
Fiddle (using data from your last question slightly modified)