sql-serverexistsnot-exists

SQL: NOT IN vs NOT EXISTS strange behavior


In my searching for answers, I seem to only be finding explanations that cover the existence of NULL which is why the NOT IN returns 0 results. However, my scenario is exactly the opposite. I'm getting my expected results with the NOT IN and my NOT EXISTS is giving me 0. And to clarify, I have no NULLs in my sub-query. Here is my query:

DECLARE @EndDate DATE= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-01') AS DATE) --First day of this month
DECLARE @StartDate DATE= DATEADD(month, -12, @EndDate) --12 months prior


SELECT Deactivated = COUNT(DISTINCT o.ClinicLocationId) 
       FROM [order].package p WITH(NOLOCK)
            INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
            INNER JOIN profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
                                                                      AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
       WHERE CAST(p.ShipDTM AS DATE) >= dateadd(month,-1,@StartDate)
             AND CAST(p.ShipDTM AS DATE) < dateadd(month,-1,@EndDate)
             AND p.isshipped = 1
             AND o.IsShipped = 1
             AND ISNULL(o.iscanceled, 0) = 0
             and not exists (
             --and o.ClinicLocationId not in (
                                                SELECT DISTINCT o.ClinicLocationId 
                                                    FROM [order].package p WITH(NOLOCK)
                                                        INNER JOIN [order].[order] o WITH(NOLOCK) ON o.packageid = p.packageid
                                                        INNER JOIN profile.ClinicLocationInfo cli WITH(NOLOCK) ON cli.LocationId = o.ClinicLocationId
                                                                                                                    AND cli.FacilityType IN('CLINIC', 'HOSPITAL')
                                                    WHERE CAST(p.ShipDTM AS DATE) >= @StartDate
                                                            AND CAST(p.ShipDTM AS DATE) < dateadd(day,-1,@EndDate)
                                                            AND p.isshipped = 1
                                                            AND o.IsShipped = 1
                                                            AND ISNULL(o.iscanceled, 0) = 0
                                                )

For a high level overview, I'm basically trying to find the number of ID's that exist in one set that don't in the next (separated by a 12 month rolling window, offset by 1 month). But for the sake of simplicity, I've written the below that very simply illustrates the exact same symptom:

drop table if exists #T1, #T2

create table #T1 (id int)
create table #T2 (id int)

insert into #T1 (id)
values
(3),
(8)

insert into #T2 (id)
values
(671),
(171)

select id from #T1 where id not in (select id from #T2)
select id from #T1 where not exists (select id from #T2)

My expectation is that both of these would yield the same results, the contents of #T1 (3,8) but instead, I only get those results in the second query by eliminating the NOT. I would assume I'm suffering from a fundamental misunderstanding of how the EXISTS operator works, as up until now I assumed there was no real difference aside from how the scanning occurred and NULL handling.

Where am I going wrong with my expectation?


Solution

  • The query shape...

    and o.ClinicLocationId not in (SELECT o.ClinicLocationId ...)
    

    ...correlates o.ClinicLocationId to o.ClinicLocationId in the subquery.

    When using exists you have to write a correlated subquery to get the same effect:

    and not exists (SELECT o1.ClinicLocationId ...
        AND o1.ClinicLocationId = o.ClinicLocationId)
    

    Note that the second query requires a different alias in the subquery.