sqlsubquerynotin

SQL Query to Identify Missing Records (NOT IN subquery)


I'm trying to write a query which will return every ID (shared value between both document types 6553 and 6554) for print jobs which generated for document type 6553, but not 6554 - these are meant to always generate together.

I've attempted the following, which takes quite a long time to run and produces a number of values which seems too high to me:

select ID from PrintQueueShadow
where DocumentType = '6553'
and CreateDate > getdate() - 7  --No more than 7 days old
and ID not in
(select ID from PrintQueueShadow
where CreateDate > getdate() - 7
and DocumentType = '6554')  --Checking against every ID for Doc Type 6554s sent in the last 7 days

Any help would be much appreciated. Thanks!


Solution

  • Your logic looks correct, but we can improve upon what you have by using exists logic instead. Also, we can try suggesting an index which might speed up the query.

    SELECT ID
    FROM PrintQueueShadow pqs1
    WHERE DocumentType = '6553' AND CreateDate > GETDATE() - 7 AND
          NOT EXISTS (
              SELECT 1
              FROM PrintQueueShadow pqs2
              WHERE pqs2.ID = pqs1.ID AND
                    CreateDate > GETDATE() - 7 AND 
                    DocumentType = '6554'
          );
    

    Using EXISTS might perform better than WHERE IN (...), as the former lets the database stop searching as soon as it finds a matching record in the subquery.

    The above query might benefit from the following index:

    CREATE INDEX idx ON PrintQueueShadow (ID, DocumentType, CreateDate);
    

    You may also try permutations of the three columns in the above index.