sqldb2sqlj

Constructing an SQL Query in DB2 for SQLJ


I have a table called Lives:

create table Lives(
    animal varchar(10) not null,
    year int not null,
    zoo varchar(10),
    primary key(animal,year)
);

With the given data:

ANIMAL     YEAR        ZOO
---------- ----------- ----------
joe               2000 a
joe               2001 a
joe               2002 d
joe               2004 c
fred              2002 b
fred              2004 c
jane              2000 a
jane              2001 b
jane              2002 b
jack              2000 a
jack              2001 a
jack              2002 d
jack              2004 c
judy              2004 b
ruby              2003 d
alfred            2006 a

It consists of the name of the animal, a year and the zoo it was in that year.

I need a query that finds the pairs (a, b) of animals that have always been in the same zoo throughout all years, and such that a is lexicographically smaller than b (i.e., a < b). More precisely, such pairs (a, b) satisfy the following condition: if animal a lives in zoo z during year y then b also lives in zoo z during year y, and vice versa.

So the output for my example data would be:

Animal  Animal
------- -------
jack    joe

Up until now I've constructed this query:

SELECT l1.animal, l2.animal
FROM Lives as l1, Lives as l2 
WHERE l2.year = l1.year and l1.animal > l2.animal

It gives me animals that have been in the zoo for 1 year. I don't now how to continue from this.

I'm going to use this query in my sqlj program. Is it possible to construct a query that satisfies my desired result, or should I go on from my current query and implement the rest in sqlj?


Solution

  • I think what you want is an exact relational division, that returns all pairs such that one pair can not have any zoo or year that the other does not have.

    A common way to do this is to use a double negation through two correlated sub-queries. It's a bit hard to follow but it should give you the correct result.

    -- select all distinct pairs such that...
    SELECT * FROM (
        SELECT a.animal AS animal1, b.animal AS animal2
        FROM lives a
        INNER JOIN lives b ON a.zoo = b.zoo AND a.year = b.year AND a.animal < b.animal 
    ) animals
    WHERE NOT EXISTS (
           -- there does not exist any animal that is not...
            SELECT * FROM lives b
            WHERE b.animal = animals.animal2
            AND NOT EXISTS (
                 -- in the set of animals that share year and zoo
                    SELECT * FROM lives c
                    WHERE c.animal = animals.animal1
                    AND c.zoo = b.zoo AND c.year = c.year
                    )
            )
    GROUP BY animals.animal1, animals.animal2
    

    Using a simple count to determine equality won't work as you'll get a match when the count is the same even though one animal has more zoos than the other. To verify this add this row:

    ANIMAL  YEAR  ZOO
    jane    2004  b
    

    And the result you get from the accepted answer will be:

    animal  animal
    jane    jack
    joe     jack
    joe     jane
    

    Whereas my solution gives:

    animal1 animal2
    jack    joe