sqlsql-server-2008-r2inner-query

SQL Outer Query NOT IN Inner Query referencing Outer Query


I have a bit of a T-SQL conundrum that appears to be working but I was wondering if someone could attempt to give me a breakdown as to what is happening here.Consider the following script:

SELECT *
FROM TableA a
WHERE a.CustomerID NOT IN (SELECT b.CustomerID FROM TableB b WHERE a.CustomerID = b.CustomerID AND a.WorkOrder = b.WorkOrder)
AND a.[Date] > DATEADD(DD,-3,GETDATE())

I'm rather stumped as to how the compiler is not imploding on this script. How can it select where NOT IN on a subquery referencing the outer query? Get contents of TableA where CustomerID NOT IN CustomerID's from TableB etc... But when it finds a matching CustomerID in the subquery, the NOT IN kicks in and prevents the record from showing in the outer query select. I'm guessing this is where the compiler stops. But then because that specific CustomerID is not selected, it cannot join in the inner query, thus inner query does not select that CustomerID, then allowing the outer query to select that record? Yes? No? Falling down the rabbit hole? Is there a better way to write this?

Would appreciate if someone could elaborate on what happening here, or reference something that could explain. I could't really find anyone explaining this process, maybe not using the right search terms.

Thank you!


Solution

  • It is called a "Correlated subquery" and "the subquery may be evaluated once for each row processed by the outer query".

    So here, for each row of TableA, the subquery seeks the matching data from TableB and determines if the NOT IN condition is met. Then on to the next row in TableA to repeat that cycle until all relevant rows of TableA have been evaluated.

    An alternative approach could be a "left excluding join" when you join the 2 tables but then ignore rows where a join exists.

    SELECT
          *
    FROM TableA a
    LEFT JOIN TableB b ON a.CustomerID = b.CustomerID
                      AND a.WorkOrder = b.WorkOrder
    WHERE b.CustomerID IS NULL
    AND a.[Date] > DATEADD(DD, -3, GETDATE())
    ;
    

    or another "semi-join" alternative by using NOT EXISTS:

    SELECT
          *
    FROM TableA a
    WHERE NOT EXISTS (
          SELECT NULL
          FROM TableB b
          WHERE a.CustomerID = b.CustomerID
          AND a.WorkOrder = b.WorkOrder
          )
    AND a.[Date] > DATEADD(DD, -3, GETDATE())
    ;
    

    Note that the subquery used for |NOT| EXISTS doesn't have to return any values through the select clause. Some prefer to use "select 1" or "select *" when using EXISTS but in truth it really doesn't matter which is used. Using "select NULL" is my preference.

    You can learn more about these alternatives by inspecting execution plans, see http://sqlfiddle.com/#!6/04064/2 for example.

    Original query:enter image description here "Left excluding join" alternative:enter image description here "Not Exists" alternative:enter image description here