mysqlinner-joinaggregation

Why does this INNER JOIN query return all rows instead of just the one matching?


CREATE TABLE EMPLOYEE (
  empId INTEGER AUTO_INCREMENT PRIMARY KEY,
  name TEXT NOT NULL,
  dept TEXT NOT NULL
);

INSERT INTO EMPLOYEE(name, dept) VALUES ('Clark', 'Sales');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Dave', 'Accounting');
INSERT INTO EMPLOYEE(name, dept) VALUES ('Ava', 'Sales');

SELECT * 
FROM EMPLOYEE AS a 
INNER JOIN EMPLOYEE b 
  ON a.empId = (SELECT MIN(b.empId));

Output:

+-------+-------+------------+-------+-------+------------+
| empId | name  | dept       | empId | name  | dept       |
+-------+-------+------------+-------+-------+------------+
|     1 | Clark | Sales      |     1 | Clark | Sales      |
|     2 | Dave  | Accounting |     2 | Dave  | Accounting |
|     3 | Ava   | Sales      |     3 | Ava   | Sales      |
+-------+-------+------------+-------+-------+------------+

I expect the query to return only the row with empId = 1.

Why does it returns all rows?

How is the INNER JOIN evaluating the ON clause with the subquery?

What is the correct behavior here?

How can I fix this to only get the row with the smallest empId?


Solution

  • Using aggregation functions in the ON clause of a join doesn't do anything useful, because in that position, b.empId doesn't refer to a set of rows, it only refers to one row at a time. So MIN(b.empId) is not looking for the minimum out of all values of empId, it's only looking for the minimum of the single value of b.empId that is current at the time the join expression is evaluated, which happens repeatedly, one row at a time.

    For the same reason, you can't use aggregate functions in the WHERE clause:

    ... WHERE a.empId = MIN(b.empId) -- ERROR
    

    The bottom line is that you may use aggregation functions in the select-list, or the HAVING clause, or ORDER BY clause, but not any other clause.

    You tried to use a subquery to put the MIN(b.empId) in a select-list, but since b.empId refers to a value of the outer query, and your subquery doesn't have a FROM clause of its own, it is really a single value within the scope of one row being evaluated for the join.

    An easier way to get the employee with the least empId is:

    SELECT * 
    FROM EMPLOYEE AS a 
    ORDER BY empId
    LIMIT 1