sqlsql-serverouter-joinsql-except

Find missing entries in a SQL table conditional on criteria


I have modest simple SQL experience (using MS SQL server 2012 here) but this evades me. I wish to output distinct names from a table (previously successfully created from a join) which have some required entries missing, but conditional on the existence of another similar entry. For anyone who has location 90, I want to check they also have locations 10 and 20...

For example, consider this table:

Name    |Number |Location
--------|-------|--------
Alice   |136218 |90
Alice   |136218 |10
Alice   |136218 |20
Alice   |136218 |40
Bob     |121478 |10
Bob     |121478 |90
Chris   |147835 |20
Chris   |147835 |90
Don     |138396 |20
Don     |138396 |10
Emma    |136412 |10
Emma    |136412 |20
Emma    |136412 |90
Fred    |158647 |90
Gay     |154221 |90
Gay     |154221 |10
Gay     |154221 |30

So formally, I would like to obtain the Names (and Numbers) of those entries in the table who:

  1. Have an entry at location 90
  2. AND do not have all the other required location entries - in this case also 10 and 20.

So in the example above

The desired query output is therefore something like:

Name    |Number |Location
--------|-------|--------
Bob     |121478 |20
Chris   |147835 |10
Fred    |158647 |10
Fred    |158647 |20
Gay     |154221 |20

I've tried a few approaches with left/right joins where B.Key is null, and select from ... except but so far I can't quite get the logical approach correct. In the original table there are hundreds of thousands of entries and only a few tens of valid missing matches. Unfortunately I can't use anything that counts entries as the query has to be locations specific and there are other valid table entries at other locations outside of the desired ones.

I feel that the correct way to do this is something like a left outer join but as the starting table is the output of another join does this require declaring an intermediate table and then outer joining the intermediate table with its self? Note there is no requirement to fill in any gaps or enter items into the table.

Any advice would be very much appreciated.

===Answered and used code pasted here===

    --STEP 0: Create a CTE of all valid actual data in the ranges that we want
WITH ValidSplits AS
(
    SELECT DISTINCT C.StartNo, S.ChipNo, S.TimingPointId
    FROM Splits AS S INNER JOIN Competitors AS C                    
        ON  S.ChipNo = C.ChipNo                     
            AND (                               
                S.TimingPointId IN (SELECT TimingPointId FROM @TimingPointCheck)
                OR
                S.TimingPointId = @TimingPointMasterCheck
            )
),

--STEP 1: Create a CTE of the actual data that is specific to the precondition of passing @TimingPointMasterCheck
MasterSplits AS
(
    SELECT DISTINCT StartNo, ChipNo, TimingPointId 
    FROM ValidSplits
        WHERE TimingPointId = @TimingPointMasterCheck           
)

--STEP 2: Create table of the other data we wish to see, i.e. a representation of the StartNo, ChipNo and TimingPointId of the finishers at the locations in @TimingPointCheck
--The key part here is the CROSS JOIN which makes a copy of every Start/ChipNo for every TimingPointId
SELECT StartNo, ChipNo, Missing.TimingPointId
FROM MasterSplits
    CROSS JOIN (SELECT * FROM @TimingPointCheck) AS Missing(TimingPointId)
EXCEPT
    SELECT StartNo, ChipNo, TimingPointId FROM ValidSplits
ORDER BY StartNo

Solution

  • Welcome to Stack Overflow.

    What you need is a bit challenging, since you want to see data that do not exist. Thus, we first must create all possible rows, then substract the ones that exist

        select ppl_with_90.Name,ppl_with_90.Number,search_if_miss.Location 
        from
        (
            select distinct Name,Number
            from yourtable t
            where Location=90
        )ppl_with_90 -- All Name/Numbers that have the 90
        cross join (values (10),(20)) as search_if_miss(Location) -- For all the previous, combine them with both 10 and 20
    except -- remove the lines already existing
        select * 
        from yourtable 
        where Location in (10,20)