Using MSSQL 2022.
Source temp table has 156,289 rows. Destination table has zero rows. Use case is to insert rows from temp->destination where rows do not already exist in destination. 4 columns determine if a row is unique.
Using Merge When Not Matched by Target, the first time code is run it takes less than 10 seconds to complete, inserting 156,289. The second time it is run, it takes ~8 minutes to complete, inserting zero rows. The behavior is correct but it takes a very long time. https://www.brentozar.com/pastetheplan/?id=H7Cmfrd2n shows a 440mb memory grant and estimated rows 358,028 even though zero are written the second time.
Using Insert Where Not Exists, the the first time code is run it takes less than 10 seconds to complete, inserting 156,289. The second time it is run, it again takes a few seconds to complete, inserting zero rows. The behavior is correct and performs acceptably. https://www.brentozar.com/pastetheplan/?id=SyQ8fdu27 shows no memory grant and estimates 156,289 rows even though zero are written the second time.
Create Table #InputInterimTags ( AspectKey VarChar(400) , AspectValue VarChar(400) , InputInterimId Int , Tpid Int )
Create Clustered Index #InputInterimTags_Pk On #InputInterimTags (Tpid,AspectKey,AspectValue)
... temp table is populated...
Update Statistics #InputInterimTags With Fullscan
Create Table dbo.Aspects
(
AspectsId Int Identity(9,1) Primary Key
, RowStampUtc Datetime2(7)
, DictAspectsId Int
, AspectKey VarChar(400)
, AspectValue VarChar(400)
, DictAspectsAssociationStatesId Int
, Tpid BigInt
)
Create Index Aspects_Ix01 On dbo.Aspects (Tpid,AspectKey,AspectValue) Include (DictAspectsId)
Have tried different combinations of composite and single indexes on Aspects table, collation, additional filtering.
Thanks.
There are two issues here:
MERGE
is unnecessary. As you have noted, you can just use INSERT...WHERE NOT EXISTS
. But you don't need to, and should not, re-join #InputInterimTags
in the NOT EXISTS
subquery.UNIQUE
, you are getting a many-to-many Merge Join in the plan, which is rather inefficient, and makes the server want to add a Spool as well.EXISTS
subquery does not need to select anything, it can just SELECT 1
.Create Table #InputInterimTags ( AspectKey VarChar(400) , AspectValue VarChar(400) , InputInterimId Int , Tpid Int )
Create UNIQUE Clustered Index #InputInterimTags_Pk On #InputInterimTags (Tpid,AspectKey,AspectValue)
... temp table is populated...
Update Statistics #InputInterimTags With Fullscan
Create Table dbo.Aspects
(
AspectsId Int Identity(9,1) Primary Key
, RowStampUtc Datetime2(7)
, DictAspectsId Int
, AspectKey VarChar(400)
, AspectValue VarChar(400)
, DictAspectsAssociationStatesId Int
, Tpid BigInt
)
Create UNIQUE Index Aspects_Ix01 On dbo.Aspects (Tpid,AspectKey,AspectValue) Include (DictAspectsId)
You may want to consider making Aspects_Ix01 On dbo.Aspects
a clustered index, and its primary key non-clustered, depending on your use case.
Then your insert should be like this.
INSERT INTO Aspects (
DictAspectsId, AspectKey, AspectValue, DictAspectsAssociationStatesId, Tpid
)
SELECT
1,
iit.AspectKey,
iit.AspectValue,
1,
iit.Tpid
FROM #InputInterimTags iit WITH (TABLOCK) -- might as well tablock for performance as it's a temp table
WHERE NOT EXISTS (SELECT 1
FROM Aspects asp
WHERE asp.Tpid = iit.Tpid
AND asp.AspectKey = iit.AspectKey
AND asp.AspectValue = iit.AspectValue
AND asp.DictAspectsId = 1
);
Consider using TABLOCK
on the inserted table also, assuming you are not worried about blocking anyone else.