mysqlgroup-bygreatest-n-per-groupfrequency

MySQL SELECT most frequent by group


How do I get the most frequently occurring category for each tag in MySQL? Ideally, I would want to simulate an aggregate function that would calculate the mode of a column.

SELECT 
  t.tag 
  , s.category 
FROM tags t 
LEFT JOIN stuff s 
USING (id) 
ORDER BY tag;

+------------------+----------+
| tag              | category |
+------------------+----------+
| automotive       |        8 |
| ba               |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |       10 |
| bamboo           |        8 |
| bamboo           |        9 |
| bamboo           |        8 |
| bamboo           |       10 |
| bamboo           |        8 |
| bamboo           |        9 |
| bamboo           |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| bath             |        9 |
+-----------------------------+

Solution

  • SELECT t1.*
    FROM (SELECT tag, category, COUNT(*) AS count
          FROM tags INNER JOIN stuff USING (id)
          GROUP BY tag, category) t1
    LEFT OUTER JOIN 
         (SELECT tag, category, COUNT(*) AS count
          FROM tags INNER JOIN stuff USING (id)
          GROUP BY tag, category) t2
      ON (t1.tag = t2.tag AND (t1.count < t2.count 
          OR t1.count = t2.count AND t1.category < t2.category))
    WHERE t2.tag IS NULL
    ORDER BY t1.count DESC;
    

    I agree this is kind of too much for a single SQL query. Any use of GROUP BY inside a subquery makes me wince. You can make it look simpler by using views:

    CREATE VIEW count_per_category AS
        SELECT tag, category, COUNT(*) AS count
        FROM tags INNER JOIN stuff USING (id)
        GROUP BY tag, category;
    
    SELECT t1.*
    FROM count_per_category t1
    LEFT OUTER JOIN count_per_category t2
      ON (t1.tag = t2.tag AND (t1.count < t2.count 
          OR t1.count = t2.count AND t1.category < t2.category))
    WHERE t2.tag IS NULL
    ORDER BY t1.count DESC;
    

    But it's basically doing the same work behind the scenes.

    You comment that you could do a similar operation easily in application code. So why don't you do that? Do the simpler query to get the counts per category:

    SELECT tag, category, COUNT(*) AS count
    FROM tags INNER JOIN stuff USING (id)
    GROUP BY tag, category;
    

    And sort through the result in application code.