sql-serversqlbulkcopysql-merge

SQL Server merge performance


Performing a merge against a table with 40 million rows takes many hours, on a database server with significant horsepower (64 GB memory, 16 cores).

I load data into a staging table via SqlBulkCopy, and then MERGE into a target table. The MERGE is performed in batches to minimize impact on TempDb.

The gist of my statement is:

DECLARE @RowID int = 0,
        @RowCount int,
        @Batches int = 0,
        @BatchSize int = 10000

SELECT @RowCount = COUNT(1) 
FROM [someStagingTable]

WHILE @RowID <= @RowCount 
BEGIN
    MERGE INTO [someTargetTable] AS Target
    USING (SELECT * FROM [someStagingTable] 
           WHERE ID BETWEEN @RowID AND @RowID + @BatchSize - 1) AS Source
              ON Target.AccountNumber = Source.AccountNumber

    WHEN MATCHED THEN
        UPDATE 
            SET ...

    WHEN NOT MATCHED BY TARGET THEN
        INSERT ...

    SET @RowID = @RowID + @BatchSize
    SET @Batches = @Batches + 1
 
    COMMIT
END

Regarding indexing:


Solution

  • In this particular case, despite the presence of an obvious index on [someTargetTable].AccountNumber, the MERGE statement needed an index hint:

    MERGE INTO [someTargetTable] WITH (INDEX=IX_someTargetTableAccountNumber) AS Target
      USING (SELECT * FROM [someStagingTable] WHERE ID BETWEEN @RowID AND @RowID + @BatchSize - 1) AS Source
      ON Target.AccountNumber = Source.AccountNumber
    

    This became obvious when inspecting the SQL execution plan, which started with a TABLE SCAN of [someTargetTable] in each batch. Adding the index hint reduced the execution time of each batch from ~300 seconds to ~1 second. The execution plan shifts from a TABLE SCAN to an INDEX SEEK, dropping rows read from 40M to 10K (the batch size).