mysqlfunctional-dependencies

Isn't "GROUP BY <primary key>" enough in this SQL query?


We have these tables in a database (simplified for this example):

Customer
--------
ID (primary key)
FirstName
LastName


Billing
-------
ID (primary key)
Customer_ID (foreign key to Customer.ID)
Date

Now we have this (My)SQL query given in our lesson handout (I always give the fully qualified names):

-- "For each customer, get the number of invoices they have submitted"
SELECT
  Customer.ID,
  Customer.FirstName,
  Customer.LastName,
  COUNT(Billing.ID)
FROM
  Customer
  INNER JOIN Billing ON Customer.ID = Billing.Customer_ID
GROUP BY
  Customer.ID,
  Customer.FirstName,
  Customer.LastName
ORDER BY COUNT(Billing.ID) DESC;

Is it necessary that we write GROUP BY Customer.ID, Customer.FirstName, Customer.LastName in the query, or is GROUP BY Customer.ID enough?

My teacher says that GROUP BY Customer.ID only works because the SQL Optimizer transforms it to GROUP BY Customer.ID, Customer.FirstName, Customer.LastName under the hood, which I do not understand, because Customer.ID already unique identifies each customer, which should be sufficient for the grouping.


Solution

  • My teacher says that GROUP BY Customer.ID only works because the SQL Optimizer transforms it to GROUP BY Customer.ID, Customer.FirstName, Customer.LastName under the hood

    Not correct. As P.Salmon already mentioned on the comment MySQL implements detection of functional dependence.

    From MySQL Handling of GROUP BY

    MySQL implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them

    In simple words if PRIMAR KEY (which uniquely identifies every row on the table) is included on the GROUP BY clause, there is no need to add the other columns which are on the select clause not part of an aggregate function.


    My suggestion.

    On the GROUP BY section put all columns which are not part of an aggregate function. It will be much easier for future readers to understand the code , even if they do not know of functional dependencies.