mysqlset-operations

MySQL --- set operations MINUS and INTERSECT on the same table


We have a need to track changing set membership over time. I can explain our problem in terms of the following simple table:

+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| timestamp | varchar(8)       | NO   |     | NULL    |                |
| member    | varchar(4)       | NO   |     | NULL    |                |
+-----------+------------------+------+-----+---------+----------------+

The result of the SELECT * operation is as follows:

SELECT * FROM My_Table;
+----+-----------+--------+
| id | timestamp | member |
+----+-----------+--------+
|  1 | 20150101  | A      |
|  2 | 20150101  | B      |
|  3 | 20150101  | C      |
|  4 | 20180101  | A      |
|  5 | 20180101  | D      |
|  6 | 20180101  | E      |
+----+-----------+--------+

Logically we can use the SET operations MINUS and INTERSECT to know which members have been added, dropped or retained over a period of time. For example, the following "logical" set operation gives the number of members added between 20150101 and 20180101:

SELECT member FROM my_table WHERE timestamp = "20180101"
MINUS
SELECT member FROM my_table WHERE timestamp = "20150101";

Similarly, the following "logical" set operation gives the number of members dropped between 20150101 and 20180101:

SELECT member FROM my_table WHERE timestamp = "20150101"
MINUS
SELECT member FROM my_table WHERE timestamp = "20180101";

The following "logical" set operation gives the number of members retained between 20150101 and 20180101:

SELECT member FROM my_table WHERE timestamp = "20150101"
INTERSECT
SELECT member FROM my_table WHERE timestamp = "20180101";

What would be the most elegant way to realize these logical set operations in MySQL?


Solution

  • LEFT JOIN would most likely serve your needs best for the first two, the third would be a simple INNER JOIN.

    members added

    SELECT t1.member 
    FROM my_table AS t1
    LEFT JOIN my_table AS t2 ON t1.member = t2.member AND t2.timestamp = "20150101"
    WHERE t1.timestamp = "20180101" 
        AND t2.id IS NULL -- this filters results to only those members who did not have an entry in t2
    ;
    

    members lost Should just be able to swap the timestamp values above.

    Using this generic form:

    SELECT t1.member 
    FROM my_table AS t1
    LEFT JOIN my_table AS t2 ON t1.member = t2.member AND t2.timestamp = Y
    WHERE t1.timestamp = X
        AND t2.id IS NULL -- this filters results to only those members who did not have an entry in t2
    ;
    

    The query is basically getting members recorded present at timestamp X that are not recorded as present at timestamp Y. If X > Y, that would be members gained; if X < Y. that would be members lost.

    members retained

    SELECT t1.member 
    FROM my_table AS t1
    INNER JOIN my_table AS t2 ON t1.member = t2.member 
       AND t2.timestamp = "20150101" -- This condition would be more "properly" 
                                     -- part of the WHERE clause's conditions, 
                                     -- but (unlike the LEFT JOIN queries) whether it 
                                     -- is in the ON or the WHERE will not change the results
    WHERE t1.timestamp = "20180101"
    ;