sqlsql-serverdatesql-server-2017change-tracking

Transforming a log file that tracks changes with timestamps to a valid-from valid-to type table


I have a kind of log table that I would like to transform using SQL Server - adding a validity interval (valid-from, valid-to fields) to records.

The source table is rather large (100K+ records), so an optimal solution would be required. It is acutally a table that tracks content (status) changes in a field for individual records.

Illustration of the source table:

RecordId DateStamp OldValue Newvalue
1 2021.01.01. A Value Another Value
1 2021.02.01 Another Value Yet Another Value
1 2021.03.01 Yet Another Value A Value
1 2021.04.01 A Value Yet Another Value
4 2021.01.01 B Value Next Value
4 2021.02.01 Next Value Second Next Value
4 2021.03.01 Second Next Value Final Value

RecordId refers to the record on which the field change has been made. OldValue and NewValue refers to the value of the field before and after the change has been made.

Changes are consecutive. RecordId and OldValue and NewValue fields are arbitrary.

Value fields can switch back and forth between statuses.

Definition of the source table:

CREATE TABLE Log (
   RecordId INT
  , DateStamp DATE
  , OldValue VARCHAR(255)
  , NewValue VARCHAR(255)
);

Example data for table:

INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (1, '2021-01-01', 'A Value', 'Another Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (1, '2021-02-01', 'Another Value', 'Yet Another Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (1, '2021-03-01', 'Yet Another Value', 'A Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (1, '2021-04-01', 'A Value', 'Yet Another Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (4, '2021-01-01', 'B Value', 'Next Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (4, '2021-02-01', 'Next Value', 'Second Next Value');
INSERT INTO Log (RecordId, DateStamp, OldValue, NewValue) VALUES (4, '2021-03-01', 'Second Next Value', 'Final Value');

Illustration of the desired output table with date slicing - so a querying for a status on a given date is possible using the validity dates:

RecordId Value ValidFromDate ValidToDate
1 A Value 1900.01.01. (or null) 2020.12.31.
1 Another Value 2021.01.01. 2021.01.31.
1 Yet Another Value 2022.02.01 2022.02.28.
1 A Value 2022.03.01 2022.03.31.
1 Yet Another Value 2022.04.01 9999.12.31 (or null)
4 B Value 1900.01.01 (or null) 2021.12.31
4 Next Value 2022.01.01 2022.01.31.
4 Second Next Value 2022.02.01 2022.02.28.
4 Final 2022.03.01 9999.12.31 (or null)

Is there an efficient way to do this?


Solution

  • One method would be to unpivot the first row, to get 2 rows for the "old" and "new" value, and then you can use LEAD to get the valid to date. You can determine the first row, by using ROW_NUMBER in a CTE (or derived table), and use the WHERE to only return the 1st row, or the "new" rows:

    WITH CTE AS(
        SELECT RecordId,
               DateStamp,
               OldValue,
               NewValue,
               ROW_NUMBER() OVER (PARTITION BY RecordID ORDER BY DateStamp) AS RN
        FROM (VALUES(1,CONVERT(date,'20210101'),'A Value ','Another Value'),
                    (1,CONVERT(date,'20210201'),'Another Value ','Yet Another Value'),
                    (1,CONVERT(date,'20210301'),'Yet Another Value ','A Value'),
                    (1,CONVERT(date,'20210401'),'A Value ','Yet Another Value'),
                    (4,CONVERT(date,'20210101'),'B Value ','Next Value'),
                    (4,CONVERT(date,'20210201'),'Next Value ','Second Next Value'),
                    (4,CONVERT(date,'20210301'),'Second Next Value ','Final Value'))V(RecordId,DateStamp,OldValue,NewValue))
    SELECT C.RecordId,
           V.Value AS Value,
           CASE V.ValueType WHEN 'New' THEN C.DateStamp END AS ValidFromDate,
           CASE V.ValueType WHEN 'New' THEN LEAD(DATEADD(DAY, -1, C.DateStamp)) OVER (PARTITION BY C.RecordID ORDER BY C.DateStamp) ELSE  DATEADD(DAY, -1, C.DateStamp) END AS ValidToDate
    FROM CTE C
         CROSS APPLY (VALUES(C.OldValue, 'Old'),(C.NewValue, 'New'))V(Value,ValueType)
    WHERE RN = 1
       OR ValueType = 'New';