sql-servert-sqlscd

T-SQL Slowly Changing Dimension - Multiple Updates, Single Action Output?


I've implemented four SCDs for the warehouse I'm working on and they're working a treat. Hopefully I've understood SCD methods correctly and assuming I have, I now have a bit of an issue I'd like help with.

The issue is that there are fields that can be updated without needing to make a new dimension as they're not time-relevant. So in the MERGE statement I'd like to have two statements:

WHEN MATCHED AND tgt.HashBytes_Value <> src.HashBytes_Value
THEN 
   UPDATE Row_Expiry_Details
WHEN MATCHED AND tgt.HashBytes_Value = src.HashBytes_Value
THEN
   UPDATE Columns_That_Are_Not_SCD_Relevant
etc..

I'm not sure how to add a second WHEN MATCHED to the query.

Here is an example of the code that runs fine now, but doesn't do what I want.

-- Created temp tables for example
DROP TABLE IF EXISTS #StagingTable
CREATE TABLE #StagingTable
(
    id INT,   
    name VARCHAR(20),
    team VARCHAR(20),
    preferences VARCHAR(50),
    hashbytes_Value VARBINARY(20)
)

DROP TABLE IF EXISTS #PresTable

CREATE TABLE #PresTable
(
    dim_id INT IDENTITY(1,1) PRIMARY KEY,
    id INT,
    name VARCHAR(20),
    team VARCHAR(20),   
    preferences VARCHAR(50),
    hashbytes_Value VARBINARY(20),
    row_Current_ind BIT DEFAULT(1),
    row_effective_date DATETIME2 DEFAULT(SYSDATETIME()),
    row_expiry_date DATETIME2 DEFAULT(SYSDATETIME())
)

-- Load Staging
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 2, name = 'dave',  team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'dave' for xml raw))
union select id = 3, name = 'peter', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'peter' for xml raw))
union select id = 4, name = 'roger', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))

-- scd merge 
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value 
from
(
merge into #PresTable as tgt
using
(
select
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_value
from
    #StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update 
set 
    tgt.row_current_ind = 0
    ,row_expiry_date = sysdatetime()
when not matched
then 
insert
(
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
)
values
(
    src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
output
    $action
    ,src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
as changes
(
    action
    ,id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
) where action = 'update'

truncate table #StagingTable
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 3, name = 'peter', team = 'team 1', preferences = 'Updated Throwaway String',  hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'peter' for xml raw))
union select id = 5, name = 'russell', team = 'team 2', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))


-- scd merge 
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value 
from
(
merge into #PresTable as tgt
using
(
select
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_value
from
    #StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update 
set 
    tgt.row_current_ind = 0
    ,row_expiry_date = sysdatetime()
when not matched
then 
insert
(
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
)
values
(
    src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
output
    $action
    ,src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
as changes
(
    action
    ,id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
) where action = 'update'




select * from #PresTable

That final output should be as you see it BUT preferences should be 'Updated Throwaway String' where id = 1.

Is this possible in the same merge statement? Or do I need an update before/after I run the merge statement?

Post was edited by somebody, but this is the example I want to see in the end:



UPDATE TGT
SET
    TGT.preferences = SRC.preferences
FROM 
    #PresTable TGT
INNER JOIN 
    #StagingTable SRC
    ON SRC.id = TGT.id
WHERE 1=1
    AND TGT.hashbytes_Value = SRC.hashbytes_Value

select * from #PresTable

Thank you.


Solution

  • The MERGE docs says (emphasis mine):

    WHEN MATCHED THEN merge_matched

    Specifies that all rows of *target_table, which match the rows returned by ON , and satisfy any additional search condition, are either updated or deleted according to the clause.

    The MERGE statement can have, at most, two WHEN MATCHED clauses. If two clauses are specified, the first clause must be accompanied by an AND clause. For any given row, the second WHEN MATCHED clause is only applied if the first isn't. If there are two WHEN MATCHED clauses, one must specify an UPDATE action and one must specify a DELETE action. When UPDATE is specified in the clause, and more than one row of matches a row in target_table based on , SQL Server returns an error. The MERGE statement can't update the same row more than once, or update and delete the same row.

    So you can't issue 2 different UPDATE on the match condition. What you can do is use 1 UPDATE and conditionally update the columns to the new value or their current value if your condition doesn't apply:

    WHEN MATCHED THEN UPDATE TableName SET
        Column1 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn1 ELSE Column1 END,
        Column2 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn2 ELSE Column2 END,
        Column3 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN Column3 ELSE UpdatedColumn3 END
    

    Keep in mind that triggers will be called for rows that might not actually get "updated".