sql-serverdatet-sql

Fill remaining dates between dates in SQL Server


I have the following data in a table:

ItemID Date Status
001 2021-01-12 Active
001 2021-01-16 Discontinued
001 2021-01-20 Active

I need to fill in the remaining dates like this:

ItemID Date Status
001 2021-01-12 Active
001 2021-01-13 Active
001 2021-01-14 Active
001 2021-01-15 Active
001 2021-01-16 Discontinued
001 2021-01-17 Discontinued
001 2021-01-18 Discontinued
001 2021-01-19 Discontinued
001 2021-01-20 Active

Also, I need suggestions on will it be efficient to fill data like this or create two different columns for Valid from and to dates in Data Warehouse?


Solution

  • I have a working solution, but I am sure there are better ways to do this. I assume you would like a working solution, and then you can investigate the performance and optimize it if need be.

    As pointed out in the comments, to solve this it is easiest if you have a calendar table. I assume you do not have anything, so I start from scratch. I generate the numbers 0 - 9 and then through successive CROSS JOINS I use those numbers to generate the numbers 0 - 10,000. I did make the assumption that there are not more than 10,000 days between the minimum date and the maximum date, but if this is not correct you can change the code to generate more numbers.

    My approach uses several common table expressions as this is how I work to incrementally solve a problem. So first generate the digits, then generate numbers, then determine the minimum and maximum dates for each ItemID, then create a recordset that includes all the dates between the minimum and maximum dates for each ItemID, then I LEFT JOIN this to copy the Status. Finally, you have the interesting problem of how to get the last non NULL value for a column, and there are several approaches. Here is one article of many you can see different approaches: https://www.mssqltips.com/sqlservertip/7379/last-non-null-value-set-of-sql-server-records/ I used the approach that uses the MAX function in a window.

    So, putting this all together into a script and starting with your data in a table variable (as well as adding some records for another test), the whole things looks like this:

    DECLARE @Data TABLE([ItemID] VARCHAR(3), [Date] DATE, [Status] VARCHAR(15));
    
    INSERT INTO @Data ([ItemID],[Date],[Status])
    VALUES ('001', '2021-01-12', 'Active'), ('001', '2021-01-16','Discontinued'),('001', '2021-01-20','Active'), 
    ('002','2022-02-01','Active'), ('002','2022-03-01','Discontinued');
    
    ;WITH digits (I) AS 
    (
        SELECT I
        FROM  (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)
    )
    ,integers (I) AS (
        SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I)      
        FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4
    ), itemMinMaxDates AS (
        SELECT [ItemID], MIN([Date]) AS [MinDate], MAX([Date]) AS [MaxDate] 
        FROM @Data GROUP BY [ItemID]
    ), itemsWithAllDates AS 
    (
        SELECT [imm].[ItemID], DATEADD(DAY,i.I, imm.[MinDate]) AS [Date] FROM [itemMinMaxDates] AS imm CROSS JOIN [integers] AS i 
        WHERE DATEADD(DAY,i.I, imm.[MinDate]) BETWEEN imm.[MinDate] AND imm.[MaxDate]
    ), itemsWithAllDatesAndStatus AS 
    (
        SELECT [allDates].[ItemID], [allDates].[Date], [d].[Status] FROM [itemsWithAllDates] AS allDates 
        LEFT OUTER JOIN @Data AS d ON [allDates].[ItemID] = [d].[ItemID] AND [allDates].[Date] = d.[Date]
    ), grp AS 
    (
        SELECT [itemsWithAllDatesAndStatus].[ItemID],
           [itemsWithAllDatesAndStatus].[Date],
           [itemsWithAllDatesAndStatus].[Status], 
           MAX(IIF([itemsWithAllDatesAndStatus].[Status] IS NOT NULL, [itemsWithAllDatesAndStatus].[Date], NULL)) OVER (PARTITION BY [itemsWithAllDatesAndStatus].[ItemID] ORDER BY [itemsWithAllDatesAndStatus].[Date] ROWS UNBOUNDED PRECEDING) AS grp
        FROM itemsWithAllDatesAndStatus 
    ) 
    SELECT [grp].[ItemID], [grp].[Date], 
    MAX([grp].[Status]) OVER (PARTITION BY [grp].[ItemID], grp  ORDER BY [grp].[Date] ROWS UNBOUNDED PRECEDING) AS [Status] 
    FROM [grp] 
    ORDER BY [grp].[ItemID], [grp].[Date];
    

    The result is what you have shown (as well as the data I included for a test):

    ItemID Date Status
    001 2021-01-12 Active
    001 2021-01-13 Active
    001 2021-01-14 Active
    001 2021-01-15 Active
    001 2021-01-16 Discontinued
    001 2021-01-17 Discontinued
    001 2021-01-18 Discontinued
    001 2021-01-19 Discontinued
    001 2021-01-20 Active
    002 2022-02-01 Active
    002 2022-02-02 Active
    002 2022-02-03 Active
    002 2022-02-04 Active
    002 2022-02-05 Active
    002 2022-02-06 Active
    002 2022-02-07 Active
    002 2022-02-08 Active
    002 2022-02-09 Active
    002 2022-02-10 Active
    002 2022-02-11 Active
    002 2022-02-12 Active
    002 2022-02-13 Active
    002 2022-02-14 Active
    002 2022-02-15 Active
    002 2022-02-16 Active
    002 2022-02-17 Active
    002 2022-02-18 Active
    002 2022-02-19 Active
    002 2022-02-20 Active
    002 2022-02-21 Active
    002 2022-02-22 Active
    002 2022-02-23 Active
    002 2022-02-24 Active
    002 2022-02-25 Active
    002 2022-02-26 Active
    002 2022-02-27 Active
    002 2022-02-28 Active
    002 2022-03-01 Discontinued

    Like I said, this is a working solution, but it is likely not the best or most efficient solution - but it gets you up and running.