t-sqlgaps-and-islandsscd

How to make non-sequential ranges unique?


I am strugling with some T-SQL code to get column [Num] (in below example table). Basically, I am trying to create an index on when a new sequence of [Type] is starting.

Q: How to create the [num] field below?

[Num] is the order of sequential ranges of field [Type]. You see that from 2022-01-01 to 2022-01-03 [Type] = 'Working'. Then on 2022-01-07, a new range of sequential rows with the same type starts (only one day though) and therefore it is 2. Basically, if there is no sequential days of the same type and another day pops up with a different type, the range stops. When a new day starts with that same type (> 1 days later), a new range starts. You can assume there cannot be different types on one day in my case.

What makes it tricky for me is that I cannot partition it on Type as it is recurring in a new sequence later on using a regular row_number.

Some extra explanation why I need this: I am creating a SCD dimension with a valid from and valid to date by taking the min and max grouped by [Type]. But since [Type] is recurring this does not work. I need a way to make each nonsequential series of [Type] unique. I am of course open to other suggestions if my solution is sub optimal.

Table_1

Date Type Num
2022-01-01 Working 1
2022-01-02 Working 1
2022-01-03 Working 1
2022-01-04 Resting 1
2022-01-05 Resting 1
2022-01-06 Resting 1
2022-01-07 Working 2
2022-01-08 Resting 2
2022-01-09 Resting 2
2022-01-10 Resting 2
2022-01-11 Resting 2
2022-01-12 Resting 2
2022-01-12 Traveling 1
2022-01-13 Working 3
2022-01-14 Traveling 2
2022-01-15 Working 4
2022-01-16 Working 4

Thank you so much!

Update:

I omit the information that is in my belief not necessary.

The end goal of this question is to create a table as below (table_2). If I have field [num] in table_1 I will be able to create table_2 myself but I am open to suggestions as well of course.

Table_2

Date Type Start Date End Date
2022-01-01 Working 2022-01-01 2022-01-03
2022-01-04 Resting 2022-01-04 2022-01-06
2022-01-07 Working 2022-01-07 2022-01-07
2022-01-08 Resting 2022-01-08 2022-01-12
2022-01-12 Traveling 2022-01-12 2022-01-12
2022-01-13 Working 2022-01-13 2022-01-13
2022-01-14 Traveling 2022-01-14 2022-01-14
2022-01-15 Working 2022-01-15 2022-01-16

Solution

  • This is a classic gaps and island problem. Find your islands, and then DENSE_RANK them:

    WITH Grps AS(
    SELECT V.Date,
           V.Type,
           V.Num,
           ROW_NUMBER() OVER (ORDER BY V.Date) - 
           ROW_NUMBER() OVER (PARTITION BY V.[Type] ORDER BY V.[Date]) AS Grp
    FROM (VALUES(CONVERT(date,'2022-01-01'),'Working',1),
                (CONVERT(date,'2022-01-02'),'Working',1),
                (CONVERT(date,'2022-01-03'),'Working',1),
                (CONVERT(date,'2022-01-04'),'Resting',1),
                (CONVERT(date,'2022-01-05'),'Resting',1),
                (CONVERT(date,'2022-01-06'),'Resting',1),
                (CONVERT(date,'2022-01-07'),'Working',2),
                (CONVERT(date,'2022-01-08'),'Resting',2),
                (CONVERT(date,'2022-01-09'),'Resting',2),
                (CONVERT(date,'2022-01-10'),'Resting',2),
                (CONVERT(date,'2022-01-11'),'Resting',2),
                (CONVERT(date,'2022-01-12'),'Resting',2),
                (CONVERT(date,'2022-01-12'),'Traveling',1),
                (CONVERT(date,'2022-01-13'),'Working',3),
                (CONVERT(date,'2022-01-14'),'Traveling',2),
                (CONVERT(date,'2022-01-15'),'Working',4),
                (CONVERT(date,'2022-01-16'),'Working',4))V([Date], [Type], Num))
    SELECT G.Date,
           G.Type,
           G.Num AS YourNum,
           DENSE_RANK() OVER (PARTITION BY G.Type ORDER BY Grp) AS CalcedNum
    FROM Grps G
    ORDER BY G.Date;