sqlmysqlmysql-5.7

SQL join that excludes rows that have a shared value with another row


Please refer to this db fiddle (SQL also pasted below):

https://www.db-fiddle.com/f/bJ5FfufvP9GHPYtBhkywdP/5

I would like to show rows from the "mpr_metrics" table that have NO associated reviews. But here's the catch: if there is a metric with an associated review, AND that review also shares the SAME start time as another review, then it should be excluded from the result set.

For example, the result of the fiddle is "3", but it should only show a count of "1", since there is only 1 metric (id=5) which has no review AND no review with the same start_time value as another review.

CREATE TABLE mpr_metrics (
  id INT,
  start_time TIMESTAMP,
  PRIMARY KEY (id)
);

CREATE TABLE mpr_metrics_reviews (
  id INT,
  metric_id INT,
  PRIMARY KEY (id)
);

INSERT INTO mpr_metrics (id, start_time) values (1, '2024-09-16');
INSERT INTO mpr_metrics (id, start_time) values (2, '2024-09-16');
INSERT INTO mpr_metrics (id, start_time) values (3, '2024-09-09');
INSERT INTO mpr_metrics (id, start_time) values (4, '2024-09-09');
INSERT INTO mpr_metrics (id, start_time) values (5, '2024-09-11');

INSERT INTO mpr_metrics_reviews (id, metric_id) values (1, 1);
INSERT INTO mpr_metrics_reviews (id, metric_id) values (2, 3);

SELECT count(DISTINCT mpr_metrics.start_time) 
FROM mpr_metrics 
LEFT JOIN mpr_metrics_reviews 
    ON mpr_metrics.id = mpr_metrics_reviews.metric_id 
WHERE mpr_metrics_reviews.id is null;

The result of this SELECT is "3", but the result should be "1".


Solution

  • Why that anti join trick? Do you think that MySQL is not capable of running straight-forward NOT IN and NOT EXISTS in a reasonable time? This may have been the case in the very early days of the DBMS, but those days are gone.

    You want two NOT EXISTS clauses, because you want metrics ...

    The simple query:

    SELECT COUNT(*) 
    FROM mpr_metrics m
    WHERE NOT EXISTS (SELECT null FROM mpr_metrics_reviews mr
                      WHERE mr.metric_id = m.id)
      AND NOT EXISTS (SELECT null FROM mpr_metrics m2
                      WHERE m2.id <> m.id AND m2.start_time = m.start_time)
    ;
    

    Demo: https://dbfiddle.uk/kghxWWSh

    (The first condition can easily be re-written as NOT IN, which may even be slightly more readable. While the second condition could also be converted into NOT IN, though, I suggest you stick to NOT EXISTS, because the IN clause would be correlated, and we strive to keep [NOT] IN clauses non-correlated for readability.)