sql-serveretlscd2

MS SQL Server: transforming SCD-2 to SCD-2 with less variables


I retrieve data from SCD-2 table with many parameters and I need to build my own SCD-2 with only one of them. Therefore, I need to get rid of excessive intervals. Please recommend an algorithm to perform that in the best way.

What I receive from the source table:

what I receive from the source table

I need to transform it to:

I need to transform it to:


Solution

  • You can use following steps to get the required result. Of course you can do it all in one step with sub-selects or CTEs, but for better traceability I prefere temporary tables.

    DROP TABLE IF EXISTS #source;
    CREATE TABLE #source (key1 integer, value1 integer, row_actual_from date, row_actual_to date);
     
    INSERT INTO #source
    VALUES
    (19999923, 15,   '2020-01-01', '2020-01-02'),
    (19999923, 15,   '2020-01-03', '2020-01-05'),
    (19999923, 15,   '2020-01-06', '2020-01-08'),
    (19999923, 11,   '2020-01-09', '2020-01-12'),
    (19999923, 3434, '2020-01-13', '2020-01-15'),
    (19999923, 11,   '2020-01-16', '2020-01-20'),
    (19999923, 15,   '2020-01-21', '2020-02-02'),
    (19999923, 3434, '2020-02-03', '2020-02-10'),
    (19999923, 3434, '2020-02-11', '2020-02-19'),
    (19999923, 3434, '2020-02-20', '2020-02-25'),
    (19999923, 99,   '2020-02-26', '9999-12-31');
    

    Step 1: Identify start and end for a single value period.

    Note that in LAG/LEAD it is essentally to have a value as NULL replacement (-99 in example) which dosn't match with the possible values in the column.

        DROP TABLE IF EXISTS #step1;
        SELECT
            key1, value1, row_actual_from, row_actual_to
            , period_start = CASE WHEN LAG(value1,  1, -99) OVER (PARTITION BY key1 ORDER BY row_actual_from) <> value1 THEN 1 ELSE 0 END
            , period_end   = CASE WHEN LEAD(value1, 1, -99) OVER (PARTITION BY key1 ORDER BY row_actual_from) <> value1 THEN 1 ELSE 0 END
        INTO #step1
        FROM #source
        ORDER BY key1, row_actual_from;
    

    Step 2: Filter on start/end rows and assign row_actual_to of end to start.

    If the period of a value has only one row this row has period_start and period_end set to 1 and therefore the sum is 2. In this case the content of row_acutal_to has already the wanted value.

        DROP TABLE IF EXISTS #step2;
        SELECT
            key1, value1, row_actual_from, row_actual_to, period_start, period_end
          , valid_from = row_actual_from
          , valid_to   = CASE (period_start + period_end)
                         WHEN 1 THEN LEAD(row_actual_to, 1) OVER (PARTITION BY key1, value1 ORDER BY row_actual_from)
                         WHEN 2 THEN row_actual_to ELSE NULL END
        INTO #step2
        FROM #step1
        WHERE (period_start + period_end) > 0
        ORDER BY key1, row_actual_from;
    

    Step 3: Filter (adjusted) start row of value period.

        SELECT key1, value1, valid_from, valid_to
        FROM   #step2
        WHERE  period_start = 1
        ORDER BY key1, row_actual_from;