mysqlsqlgroup-bysql-order-byaggregate-functions

MySQL "Group By" and "Order By"


I want to be able to select a bunch of rows from a table of e-mails and group them by the from sender. My query looks like this:

SELECT 
    `timestamp`, `fromEmail`, `subject`
FROM `incomingEmails` 
GROUP BY LOWER(`fromEmail`) 
ORDER BY `timestamp` DESC

The query almost works as I want it — it selects records grouped by e-mail. The problem is that the subject and timestamp don't correspond to the most recent record for a particular e-mail address.

For example, it might return:

fromEmail: john@example.com, subject: hello
fromEmail: mark@example.com, subject: welcome

When the records in the database are:

fromEmail: john@example.com, subject: hello
fromEmail: john@example.com, subject: programming question
fromEmail: mark@example.com, subject: welcome

If the "programming question" subject is the most recent, how can I get MySQL to select that record when grouping the e-mails?


Solution

  • A simple solution is to wrap the query into a subselect with the ORDER statement first and applying the GROUP BY later:

    SELECT * FROM ( 
        SELECT `timestamp`, `fromEmail`, `subject`
        FROM `incomingEmails` 
        ORDER BY `timestamp` DESC
    ) AS tmp_table GROUP BY LOWER(`fromEmail`)
    

    This is similar to using the join but looks much nicer.

    Using non-aggregate columns in a SELECT with a GROUP BY clause is non-standard. MySQL will generally return the values of the first row it finds and discard the rest. Any ORDER BY clauses will only apply to the returned column value, not to the discarded ones.

    IMPORTANT UPDATE Selecting non-aggregate columns used to work in practice but should not be relied upon. Per the MySQL documentation "this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."

    As of 5.7.5 ONLY_FULL_GROUP_BY is enabled by default so non-aggregate columns cause query errors (ER_WRONG_FIELD_WITH_GROUP)

    As @mikep points out below the solution is to use ANY_VALUE() from 5.7 and above

    See http://www.cafewebmaster.com/mysql-order-sort-group https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value