I've implemented four SCDs for the warehouse I'm working on and they're working a treat. Hopefully I've understood SCD methods correctly and assuming I have, I now have a bit of an issue I'd like help with.
The issue is that there are fields that can be updated without needing to make a new dimension as they're not time-relevant. So in the MERGE statement I'd like to have two statements:
WHEN MATCHED AND tgt.HashBytes_Value <> src.HashBytes_Value
THEN
UPDATE Row_Expiry_Details
WHEN MATCHED AND tgt.HashBytes_Value = src.HashBytes_Value
THEN
UPDATE Columns_That_Are_Not_SCD_Relevant
etc..
I'm not sure how to add a second WHEN MATCHED to the query.
Here is an example of the code that runs fine now, but doesn't do what I want.
-- Created temp tables for example
DROP TABLE IF EXISTS #StagingTable
CREATE TABLE #StagingTable
(
id INT,
name VARCHAR(20),
team VARCHAR(20),
preferences VARCHAR(50),
hashbytes_Value VARBINARY(20)
)
DROP TABLE IF EXISTS #PresTable
CREATE TABLE #PresTable
(
dim_id INT IDENTITY(1,1) PRIMARY KEY,
id INT,
name VARCHAR(20),
team VARCHAR(20),
preferences VARCHAR(50),
hashbytes_Value VARBINARY(20),
row_Current_ind BIT DEFAULT(1),
row_effective_date DATETIME2 DEFAULT(SYSDATETIME()),
row_expiry_date DATETIME2 DEFAULT(SYSDATETIME())
)
-- Load Staging
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 2, name = 'dave', team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'dave' for xml raw))
union select id = 3, name = 'peter', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'peter' for xml raw))
union select id = 4, name = 'roger', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))
-- scd merge
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value
from
(
merge into #PresTable as tgt
using
(
select
id
,name
,team
,preferences
,hashbytes_value
from
#StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update
set
tgt.row_current_ind = 0
,row_expiry_date = sysdatetime()
when not matched
then
insert
(
id
,name
,team
,preferences
,hashbytes_Value
)
values
(
src.id
,src.name
,src.team
,src.preferences
,src.hashbytes_value
)
output
$action
,src.id
,src.name
,src.team
,src.preferences
,src.hashbytes_value
)
as changes
(
action
,id
,name
,team
,preferences
,hashbytes_Value
) where action = 'update'
truncate table #StagingTable
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 3, name = 'peter', team = 'team 1', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'peter' for xml raw))
union select id = 5, name = 'russell', team = 'team 2', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))
-- scd merge
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value
from
(
merge into #PresTable as tgt
using
(
select
id
,name
,team
,preferences
,hashbytes_value
from
#StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update
set
tgt.row_current_ind = 0
,row_expiry_date = sysdatetime()
when not matched
then
insert
(
id
,name
,team
,preferences
,hashbytes_Value
)
values
(
src.id
,src.name
,src.team
,src.preferences
,src.hashbytes_value
)
output
$action
,src.id
,src.name
,src.team
,src.preferences
,src.hashbytes_value
)
as changes
(
action
,id
,name
,team
,preferences
,hashbytes_Value
) where action = 'update'
select * from #PresTable
That final output should be as you see it BUT preferences should be 'Updated Throwaway String' where id = 1.
Is this possible in the same merge statement? Or do I need an update before/after I run the merge statement?
Post was edited by somebody, but this is the example I want to see in the end:
UPDATE TGT
SET
TGT.preferences = SRC.preferences
FROM
#PresTable TGT
INNER JOIN
#StagingTable SRC
ON SRC.id = TGT.id
WHERE 1=1
AND TGT.hashbytes_Value = SRC.hashbytes_Value
select * from #PresTable
Thank you.
The MERGE docs says (emphasis mine):
WHEN MATCHED THEN merge_matched
Specifies that all rows of *target_table, which match the rows returned by ON , and satisfy any additional search condition, are either updated or deleted according to the clause.
The MERGE statement can have, at most, two WHEN MATCHED clauses. If two clauses are specified, the first clause must be accompanied by an AND clause. For any given row, the second WHEN MATCHED clause is only applied if the first isn't. If there are two WHEN MATCHED clauses, one must specify an UPDATE action and one must specify a DELETE action. When UPDATE is specified in the clause, and more than one row of matches a row in target_table based on , SQL Server returns an error. The MERGE statement can't update the same row more than once, or update and delete the same row.
So you can't issue 2 different UPDATE
on the match condition. What you can do is use 1 UPDATE
and conditionally update the columns to the new value or their current value if your condition doesn't apply:
WHEN MATCHED THEN UPDATE TableName SET
Column1 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn1 ELSE Column1 END,
Column2 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn2 ELSE Column2 END,
Column3 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN Column3 ELSE UpdatedColumn3 END
Keep in mind that triggers will be called for rows that might not actually get "updated".