sqldatabasedatabase-designrelational-databaserelational-algebra

Exist and Not Exists Query Misunderstanding?


Schema of a product database:

Parts (pid, pname)
Suppliers (sid, sname)
Catalog (sid, pid)

sid in Catalog is a foreign key that references Suppliers and pid in Catalog is a foreign key that references Parts. (s1, p1) in Catalog shows "supplier s1 produces part p1".

How can I find details of parts that are not supplied by some suppliers?

I read some parts as follows:

SELECT *
FROM parts P
WHERE ... (
    SELECT S.sid 
    FROM suppliers 
    WHERE ... (
        SELECT * 
        FROM catalog C 
        WHERE s.sid = C.sid AND P.pid = C.pid))

Why do we use EXISTS for the first ... but NOT EXISTS for the second ...?


Solution

  • If I want find details of Parts that are not supplied by some suppliers, how we can do this?

    If you want "details for parts where there exists a supplier that doesn't supply that part" then your query, with EXISTS then NOT EXISTS, is correct.

    My problem is via - why at first? We use exist and in second ... we use not exist?

    I will explain why "details for parts where there exists a supplier that doesn't supply that part" can give your query. (With EXISTS then NOT EXISTS.)

    A table holds that rows that make some predicate (statement template) into a true proposition (statement):

    You need to express the predicate for your result. You want the P.pid-P.pname rows that make a true statement from this:

        "part P.pid is named P.pname"
    AND EXISTS S.sid ["supplier S.sid is named something"
            AND NOT "supplier S.sid supplies part P.pid"]
    

    But we must express this predicate in terms of the given predicates (plus conditions and logic operators) so that the DBMS can calculate the rows that satisfy it:

        "part P.pid is named P.pname"
    AND EXISTS S.sid [EXISTS S.sname "supplier S.sid is named S.sname"
            AND NOT EXISTS C.sid, C.pid [
                    "supplier C.sid supplies part P.pid" AND C.sid = S.sid AND C.pid = P.pid]]
    

    Now convert to SQL:

    SELECT *
    FROM Parts P
    WHERE EXISTS (SELECT S.sid FROM Suppliers S
            WHERE NOT EXISTS (SELECT 1 FROM Catalog C
                   WHERE C.sid = S.sid AND C.pid = P.pid))
    

    PS Your question's original "details of parts that are not supplied by some suppliers" is unclear/ambiguous. It could mean P.id-P.pname pairs where:

    1. "the parts are not supplied by all the suppliers" (the interpretation agreeing with your SQL)
    2. "the parts are not not supplied (by any of the suppliers)" (the interpretation agreeing with the two SQL versions in Shnugo's answer
    3. "the parts don't have multiple suppliers"

      SELECT * FROM Parts P
      WHERE NOT EXISTS (SELECT *
          FROM Catalog C1 JOIN Catalog C2 ON C1.sid <> C2.sid
          AND C1.pid = P.pid AND C2.pid = P.pid)
      

    If you have Parts 1-4 and Suppliers 1-3 then the above give three different answers after:

    INSERT INTO Catalog VALUES (1,1),(2,1),(2,2),(3,1),(3,2),(3,3);
    

    PPS Logic formulations for the above are:

    -- 1. "the parts are not supplied by all the suppliers"  
        P(P.pid, P.pname)
    AND EXISTS S.sid, S.sname [S(S.sid, S.sname) AND NOT C(S.sid, P.pid)]
    -- 2 "the parts are not supplied (by some of the suppliers)"
    P(P.pid, P.pname) AND NOT EXISTS C.sid C(C.sid, P.pid)
    -- 3. "the parts don't have multiple suppliers"
        P(P.pid, P.pname)
    AND NOT EXISTS C1.sid, C2.sid
            [C(C1.sid, P.pid) AND C(C2.sid, P.pid) AND C1.sid <> C2.sid]