sqlsql-servert-sqlsql-server-2016

Calculate the end date of partition rows using the start date of the following partition


How can I write a CTE in SQL Server to get the End_Date for a given partition of rows, where the partition is by Person & Type, with a new partition starting if the Type changes within a person. End_Date is defined as the Start_Date of the next Type partition within a Person partition. The last partition within a Person will always have a null End_Date.

I previously asked a similar question, which gained an answer, but I had neglected to show the case where the Type might no only change multiple times within a Person, but might also repeat the same Type again. This scenario should also create a new partition as shown in the data below.

For the following source data, records should be sorted by dt_eff for the person.

Source Data

Person    Type     dt_eff
123       ABC      2018-10-23
123       DEF      2018-12-19
124       ABC      2020-01-01
124       ABC      2020-02-15
124       ABC      2020-05-14
124       DEF      2020-10-13  < Note change of Type to DEF
124       ABC      2021-01-15  < Note change of Type back to ABC

Expected Output

Person    Type     Start_Date   End_Date     Partition
123       ABC      2018-10-23   2018-12-19   123:1 < Start_Date from 123:DEF
123       DEF      2018-12-19   NULL         123:2
124       ABC      2020-01-01   2020-10-13   124:1
124       ABC      2020-02-15   2020-10-13   124:1
124       ABC      2020-05-14   2020-10-13   124:1 < Start_Date from 124:DEF
124       DEF      2020-10-13   2021-01-15   124:2< Start_Date from next 124:ABC
124       ABC      2021-01-15   NULL         124:3

Note the Partition column is not a required output, its just added for clarify


Solution

  • The key takeaway here is that you can't always produce the results you want in a single query... Sometimes you have to take multiple steps. The following approach is the one I use:

    1. Work out logically how the problem might be solved.
    2. Build a query up that produces the logical result.
    3. See if there is an opportunity to simplify the query.

    The following produces the results you desire... how is works is explained in the comments.

    with cte1 as (
      select *
        -- 1. Find the dt_eff value of the next row, we need this to get the end date
        , lead(dt_eff) over (partition by Person order by dt_eff) dt_eff_lead
        -- Find the Type value of the previous row, we need this to detect a change in type
        , lag(Type, 1, Type) over (partition by Person order by dt_eff) type_lag
      from Person
    ), cte2 as (
      select Person, Type, dt_eff Start_Date
        , dt_eff_lead
        -- Count the number of Type transitions by comparing this row with the previous row
        -- prior to and including the current row
        -- Each time the count changes gives us a new partition to use in the final result
        , sum(case when Type <> type_lag then 1 else 0 end)
          over (partition by person order by dt_eff asc
            rows between unbounded preceding and current row) TypeGroup
      from cte1
    )
    select Person, Type, Start_Date
      -- Get the maximum dt_eff_lead for a given type partition within a person partition
      , max(dt_eff_lead) over (partition by Person, TypeGroup) End_Date
    from cte2
    order by Person, Start_Date, Type;
    

    Fiddle (using data from your last question slightly modified)