sqlsql-servermerge-statement

Is it safe to use a subquery as a source table for a merge statement?


I found some similar questions but none exactly matching my use case. When trying to use a MERGE statement Microsoft mentions the following:

Source

It's important to specify only the columns from the target table to use for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Don't attempt to improve query performance by filtering out rows in the target table in the ON clause; for example, such as specifying AND NOT target_table.column_x = value. Doing so can return unexpected and incorrect results.

In another place it mentions that you could use views to "filter" the source table.

Source

Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is defined on the target table, any actions against it must satisfy the conditions for updating views. For more information about updating data by using a view, see Modifying Data Through a View.

But what I cannot find is whether or not you can "safely" use a subquery instead of the source table. For example:

MERGE TableA a
USING (
    SELECT *
    FROM TableB b
    WHERE
        b.Status = 5

) b
ON a.ID = b.ForeignID
... MATCH statements ...

Would this cause similar problems as using a CTE or filtering by using the ON statement?


Solution

  • These two methods are exactly the same. A view and a CTE and a derived table (what you are calling a subquery) and an inline TVF behave identically to the compiler: they are inlined into the query.

    The documentation is actually incorrect (note I've linked the latest docs):

    Use the WITH <common table expression> clause to filter out rows from the source or target tables. This method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. We recommend that you avoid using this method or test thoroughly before implementing it.

    This is completely false as I will explain.

    MERGE functions like a FULL JOIN. The source and target are full-joined together, and the ON condition in the MERGE is the ON condition of the FULL JOIN. Then the WHEN MATCHED AND conditions go in a later filter like a WHERE. Sometimes the FULL JOIN becomes a LEFT or INNER if the match clauses are not present.

    So for example, the following

    MERGE TableA a
    USING (
        SELECT *
        FROM TableB b
        WHERE
            b.Status = 5
    ) b
    ON a.ID = b.ForeignID
    WHEN NOT MATCHED THEN
      INSERT (...)
      VALUES (...)
    WHEN MATCHED AND a.x <> b.x THEN
      UPDATE SET
        x = b.x;
    WHEN NOT MATCHED THEN
      DELETE;
    

    becomes pseudo-code like:

    SELECT  -- data to MERGE
      a.ID,
      b.ForeignID,
      b.x,
      CASE WHEN b.ForeignID IS NULL THEN 'INSERT'
           WHEN b.ForeignID IS NOT NULL AND a.x <> b.x THEN 'UPDATE'
           WHEN a.ID IS NULL THEN 'DELETE'
        END AS $action
    FROM TableA a
    FULL JOIN (
        SELECT *
        FROM TableB b
        WHERE
            b.Status = 5
    ) b
    ON a.ID = b.ForeignID
    WHERE (b.ForeignID IS NULL)  -- not matched
       OR (a.ID IS NOT NULL AND b.ForeignID IS NOT NULL AND a.x <> b.x) -- matched AND
       OR (a.ID IS NULL)  -- not matched by source
    ;
    

    As you can see, it would make no difference where the source or target come from, whether a table, view, derived table or iTVF.

    The key here is: if you want to merge to and from a subset of the data, then filter the source and target before you get to the ON. The ON clause should only be used for one thing: matching up rows. Do not put any other conditions in there.