Suppliers(sid, sname, address)
Parts(pid, pname, colour)
Catalog(sid, pid, cost)
The answer to 'find all the suppliers who supply every part' is:
SELECT C.sid
FROM Catalog C
WHERE NOT EXISTS (
SELECT P.pid
FROM Parts P
WHERE NOT EXISTS (
SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid
)
)
What is an explanation for this answer?
I've heard it explained as 'Find suppliers such that there does not exist a part that they do not sell', but I'm struggling to see how the following accomplishes that.
SELECT C1.sid
FROM Catalog C1
WHERE C1.sid = C.sid
AND C1.pid = P.pid
If Catalog is
James | Hammer
James | Anvil
James | Wrench
Henry | Hammer
Leroy | Anvil
and Part is
Hammer
Anvil
Wrench
then after the innermost clause what is returned?
Is it
James | Hammer
James | Anvil
James | Wrench
Henry | Hammer
Leroy | Anvil
and then does NOT EXISTS make it the following?
James | --
Henry | Anvil, Wrench
Leroy | Hammer, Wrench
How does using the Parts table then subtract these values?
This is a double nested NOT EXISTS
query (no really, that's what I've usually seen it called), and it is used specificially to answer this type of quesion, i.e., "Are there any x true for all y?"
Here's MySQL's page on EXISTS and NOT EXISTS, which mentions this technique specifically.
First, in the innermost SELECT
query, you are selecting parts that each store carries. Then, with the first NOT EXISTS
clause, you are selecting parts that are NOT carried by each store. Finally, in the outer NOT EXISTS
clause, you are selecting stores that returned an empty set for the inner NOT EXISTS
clause, meaning they carry every part.
Here is a SQLFiddle of this query in action.
A word of warning: if you are working with SQL, it is always good to think and work in sets, and thinking in linear terms like what is about to follow can get you into trouble quickly. Don't make it a habit!
However, sometimes when trying to figure out a complex query like this, it can help to think of these things as loops.
So, taking the data in the fiddle as an example, we have:
suppliers:
sid, name
9, 'AAA'
8, 'BBB'
7, 'CCC'
parts:
pid, name
1, 'wood'
2, 'stone'
3, 'paper'
catalog:
cid, pid, sid
1,1,9
2,2,9
3,1,8
4,1,7
5,2,7
6,3,7
So with this data, AAA carries wood and stone, BBB only carries wood, and CCC carries wood, stone, and paper.
Now let's step through the query row by row. We are selecting from suppliers
, and we are deciding which rows to include in the result set, so start with the first row in suppliers
: 9,'AAA'
. We'll call this row S
temporarily. We'll only include this row if there is nothing in the inner result set, so lets take a look at that.
suppliers:
sid, name
S => 9, 'AAA'
8, 'BBB'
7, 'CCC'
This result set is selecting from parts
, and we are going to go through it row by row. S
still equals 9,'AAA'
while we do this. So start with the first row in parts
: 1,'wood'
. We'll call this row P
for now. We'll only include this row in this first inner result set if there is nothing in the next level of result set, so let's move there. Remember that S
= 9,'AAA'
and P
= 1,'wood'
.
suppliers:
sid, name
S => 9, 'AAA'
8, 'BBB'
7, 'CCC'
parts:
pid, name
P => 1, 'wood'
2, 'stone'
3, 'paper'
This innermost query is selecting from 'catalog'. We're looking for any row, we'll call it C
, in catalog
where C.sid
equals S.sid
AND C.pid
equals P.pid
. This means that the current supplier carries the part. We want parts that the current supplier DOESN'T carry, that's why we invert the result set. We know that S
's sid is 9, and we know that P
's pid is 1. Are there any rows in C
that match that? The very first row in C
matches that, so we know that this result set is not empty.
suppliers:
sid, name
S => 9, 'AAA'
8, 'BBB'
7, 'CCC'
parts:
pid, name
P => 1, 'wood'
2, 'stone'
3, 'paper'
catalog:
cid, pid, sid
C => 1,1,9 --Match found! Don't include P in outer result set
2,2,9
3,1,8
4,1,7
5,2,7
6,3,7
Now jump back to the next outermost loop. The inner result set was not empty, so we know that 1,'wood
will not be part of this loop's result set. So we move to the next row in parts
, 2,'stone'
. We update the value of P
to equal this row. Should we include this row in the result set? We have to run the inner query again with our new value for P
(S
has still not changed). So we look for any rows in catalog
with sid
equal to 9 and pid
equal to 2. The second row matches, so there is a result set.
suppliers:
sid, name
S => 9, 'AAA'
8, 'BBB'
7, 'CCC'
parts:
pid, name
1, 'wood'
P => 2, 'stone'
3, 'paper'
catalog:
cid, pid, sid
1,1,9
C => 2,2,9 --Match found! Don't include P in outer result set
3,1,8
4,1,7
5,2,7
6,3,7
Jump back to the next outermost loop. The inner result set was not empty, so 2,'stone'
will not be part of this loop's result set.
When we go through all this again for 3,'paper'
, we find that there are no rows in catalog
that have sid = 9
and pid = 3
. The innermost result set is empty, so we know to include this value of P
in the next outermost loop.
suppliers:
sid, name
S => 9, 'AAA'
8, 'BBB'
7, 'CCC'
parts:
pid, name
1, 'wood'
2, 'stone'
P => 3, 'paper'
catalog:
cid, pid, sid
1,1,9
2,2,9
3,1,8
4,1,7
5,2,7
6,3,7
C => --No match found, include P in outer result set
We have gone through the entire parts
table at this point, so we have our final result set for the second loop, and it is not empty, which means that we found a part that S
does not carry, so we know that we cannot include the current value of S
in our final outer loop's result set.
So we move on to the next row in suppliers
and start the whole process over:
suppliers:
sid, name
9, 'AAA'
S => 8, 'BBB'
7, 'CCC'
Once we get to S = 7,'CCC'
we will go through all of these loops, and a match will be found in the inner loop for every value of P that is supplied, which means that that second loop will have an empty set. We couldn't find any parts that supplier doesn't carry, so that value of S
is added to the result set, meaning they carry everything!