We have a need to track changing set membership over time. I can explain our problem in terms of the following simple table:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| timestamp | varchar(8) | NO | | NULL | |
| member | varchar(4) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
The result of the SELECT * operation is as follows:
SELECT * FROM My_Table;
+----+-----------+--------+
| id | timestamp | member |
+----+-----------+--------+
| 1 | 20150101 | A |
| 2 | 20150101 | B |
| 3 | 20150101 | C |
| 4 | 20180101 | A |
| 5 | 20180101 | D |
| 6 | 20180101 | E |
+----+-----------+--------+
Logically we can use the SET operations MINUS and INTERSECT to know which members have been added, dropped or retained over a period of time. For example, the following "logical" set operation gives the number of members added between 20150101 and 20180101:
SELECT member FROM my_table WHERE timestamp = "20180101"
MINUS
SELECT member FROM my_table WHERE timestamp = "20150101";
Similarly, the following "logical" set operation gives the number of members dropped between 20150101 and 20180101:
SELECT member FROM my_table WHERE timestamp = "20150101"
MINUS
SELECT member FROM my_table WHERE timestamp = "20180101";
The following "logical" set operation gives the number of members retained between 20150101 and 20180101:
SELECT member FROM my_table WHERE timestamp = "20150101"
INTERSECT
SELECT member FROM my_table WHERE timestamp = "20180101";
What would be the most elegant way to realize these logical set operations in MySQL?
LEFT JOIN would most likely serve your needs best for the first two, the third would be a simple INNER JOIN.
members added
SELECT t1.member
FROM my_table AS t1
LEFT JOIN my_table AS t2 ON t1.member = t2.member AND t2.timestamp = "20150101"
WHERE t1.timestamp = "20180101"
AND t2.id IS NULL -- this filters results to only those members who did not have an entry in t2
;
members lost Should just be able to swap the timestamp values above.
Using this generic form:
SELECT t1.member
FROM my_table AS t1
LEFT JOIN my_table AS t2 ON t1.member = t2.member AND t2.timestamp = Y
WHERE t1.timestamp = X
AND t2.id IS NULL -- this filters results to only those members who did not have an entry in t2
;
The query is basically getting members recorded present at timestamp X that are not recorded as present at timestamp Y. If X > Y, that would be members gained; if X < Y. that would be members lost.
members retained
SELECT t1.member
FROM my_table AS t1
INNER JOIN my_table AS t2 ON t1.member = t2.member
AND t2.timestamp = "20150101" -- This condition would be more "properly"
-- part of the WHERE clause's conditions,
-- but (unlike the LEFT JOIN queries) whether it
-- is in the ON or the WHERE will not change the results
WHERE t1.timestamp = "20180101"
;