I'm merging millions (1-10) of rows into a table that contains 20+ millions of rows.
Target table has 16 columns, 1 PK column (NormalID).
Problem I found is that some records have their PK column Ids changed. So for example record "A" with PK id of "12345678", will have its id changed to "4567890". Hence why I've put in a delete in the merge. I've limited the source pull of data to be 2 months at the time.
Source Table (this usually holds a month or two worth of data. For example year 2024 with September (9) and October (10):
CREATE TABLE [dbo].[slowmergetbl_source](
[sEntity] [varchar](50) NOT NULL,
[wYear] [smallint] NOT NULL,
[wPeriod] [smallint] NOT NULL,
[sAccount] [varchar](50) NOT NULL,
[wBracket] [smallint] NOT NULL,
[sCurrency] [varchar](3) NULL,
[dValue] [float] NULL,
[bDirty] [bit] NULL DEFAULT 1,
[dFactValue] [float] NULL,
[wEntityId] [int] NULL,
[wAccountId] [int] NULL,
[wTimeId] [int] NULL,
[wExtDimId1] [int] NOT NULL DEFAULT 0,
[NormalID] [int] NOT NULL,
CONSTRAINT [PK_SourceNormalID] PRIMARY KEY CLUSTERED ([NormalID] ASC)
);
Table script for the main table that is being merged into:
CREATE TABLE [reporting].[slowmergetbl_target](
[sEntity] [varchar](50) NOT NULL,
[wYear] [smallint] NOT NULL,
[wPeriod] [smallint] NOT NULL,
[sAccount] [varchar](50) NOT NULL,
[wBracket] [smallint] NOT NULL,
[sCurrency] [varchar](3) NULL,
[dValue] [float] NULL,
[bDirty] [bit] NULL DEFAULT 1,
[dFactValue] [float] NULL,
[wEntityId] [int] NULL,
[wAccountId] [int] NULL,
[wTimeId] [int] NULL,
[wExtDimId1] [int] NOT NULL DEFAULT 0,
[NormalID] [int] NOT NULL,
[ModifiedDate] [datetime] NULL,
[ModifyType] [varchar](10) NULL,
CONSTRAINT [PK_NormalID] PRIMARY KEY CLUSTERED ([NormalID] ASC),
INDEX [IDX_Year] NONCLUSTERED ([wYear] ASC),
INDEX [IDX_Period] NONCLUSTERED ([wPeriod] ASC),
INDEX [IX_ReportQuery] NONCLUSTERED
(
[wYear],
[dFactValue]
)
INCLUDE ([sEntity],[wPeriod],[sAccount],[wBracket],[wAccountId]),
INDEX [IX_ReportQuery2] NONCLUSTERED
(
[sAccount],
[wBracket])
INCLUDE ([sEntity],[wYear],[wPeriod],[dValue],[wEntityId]),
INDEX [IDX_Normal_ModifyType] NONCLUSTERED ([ModifyType] ASC)
);
Merge statement:
merge reporting.slowmergetbl_target as a
using dbo.slowmergetbl_source as b on a.normalid = b.normalid
WHEN NOT MATCHED BY SOURCE and a.wyear in (
2024
) and a.wPeriod in (9,10 )
THEN UPDATE SET
a.modifieddate=getdate(),
a.modifytype='DELETED'
when not matched by target
then insert (
[sEntity],
[wYear],
[wPeriod],
[sAccount],
[wBracket],
[sCurrency],
[dValue],
[bDirty],
[dFactValue],
[wEntityId],
[wAccountId],
[wTimeId],
[wExtDimId1],
[NormalID],
[ModifiedDate],
[ModifyType]
)
values (
b.[sEntity],
b.[wYear],
b.[wPeriod],
b.[sAccount],
b.[wBracket],
b.[sCurrency],
b.[dValue],
b.[bDirty],
b.[dFactValue],
b.[wEntityId],
b.[wAccountId],
b.[wTimeId],
b.[wExtDimId1],
b.[NormalID],
getdate(),'INSERT')
when matched
and (
isnull(a.[sAccount],'')<>isnull(b.[sAccount],'')
or isnull(a.[sEntity],'')<>isnull(b.[sEntity],'')
or isnull(a.[wYear],0)<>isnull(b.[wYear],0)
or isnull(a.[wPeriod],0)<>isnull(b.[wPeriod],0)
or isnull(a.[wBracket],0)<>isnull(b.[wBracket],0)
or isnull(a.[wEntityId],0)<>isnull(b.[wEntityId],0)
or isnull(a.[wAccountId],0)<>isnull(b.[wAccountId],0)
or isnull(a.[sCurrency],'')<>isnull(b.[sCurrency],'')
or isnull(cast(a.[dValue] as float),0.00)<>isnull(cast(b.[dValue] as float),0.00)
or isnull(a.[bDirty],0)<>isnull(b.[bDirty],0)
or isnull(cast(a.[dFactValue] as float),0.00)<>isnull(cast(b.[dFactValue] as float),0.00)
or isnull(a.[wTimeId],0)<>isnull(b.[wTimeId],0)
or isnull(a.[wExtDimId1],0)<>isnull(b.[wExtDimId1],0)
)
then update set
a.[sAccount]=b.[sAccount],
a.[sEntity]=b.[sEntity],
a.[wYear]=b.[wYear],
a.[wPeriod]=b.[wPeriod],
a.[wBracket]=b.[wBracket],
a.[wEntityId]=b.[wEntityId],
a.[wAccountId]=b.[wAccountId],
a.[sCurrency]=b.[sCurrency],
a.[dValue]=b.[dValue],
a.[bDirty]=b.[bDirty],
a.[dFactValue]=b.[dFactValue],
a.[wTimeId]=b.[wTimeId],
a.[wExtDimId1]=b.[wExtDimId1],
a.modifieddate=getdate(),
a.modifytype='UPDATE';
Test data would look something like this:
insert into [dbo].[slowmergetbl_source] values ('1234',2024,9,'Test1',0,'USD',500,1,500,4567,1788,202409,0,505489215);
insert into [dbo].[slowmergetbl_source] values ('6795',2024,10,'Test2',0,'USD',100,1,100,0986,8897,202410,0,515456210);
Seems to be running very slowly, sometimes few hours.
Any recommendations on how to make this better and speed up the merge?
Thanks.
A couple of suggestions:
check if you have index on dbo.slowmergetbl.NormalID
,
sometimes splitting MERGE into 3 separate operators (INSERT/UPDATE/UPDATE) can help, at least you will know which part causes the main delay, and can analyze its execution plan in details,
you have too complex logic for checking whether update is needed ("when matched" section). Sometimes logic like this can lead to nested loops appearing in the execution plan, and very poor performance. Consider adding "hash" column into both tables, calculate it based on all fields that you compare (sAccount, sEntity, ...), it should look like HASHBYTES('SHA2_256', CONCAT(sAccount,'|',sEntity,'|',...))
and use only this field to compare the data to decide if it needs to be updated.