mysqlgroupwisegroupwise-maximum

MySQL groupwise maximum performance issues on a million row table


I am trying to find a straightforward way to improve the performance for very active forums where there are a huge number of posts and mysql can no longer do table sorts in memory and doesn't seem to take full advantage of indexes.

This simple query finds the most recent post in each topic for a user to determine if they have any replies since (by later comparing the topic_time)

SELECT p.*, MAX(post_time) as post_time FROM forum_posts AS p   
WHERE p.poster_id = '1' AND p.post_status = '0' 
GROUP BY p.topic_id  
ORDER BY post_time DESC 
LIMIT 50

simple, flat table looks something like

post_id | poster_id | topic_id | post_status | post_time | post_text

However it's performance falls apart when there are a million posts and the user themselves have tens of thousands of posts. MySQL can either no longer sort the table in memory or there are far too many rows to scan. It can take up to 3 seconds in realworld use which is unacceptable imho because it's spiking the CPU during that time and slowing everyone else down.

I can make any combination of index of course, but mysql seems to mostly like to use a combo of

poster_id + post_time 

So it just selects the one user's 50k posts out of the million and then starts grouping by topic_id and sorting. Strangely adding topic_id into the index mix doesn't seem to help the performance, though it may be the order of the index fields?

I've tried to write an equivalent JOIN instead so I can use more than one index but I run into problems with the fact that each side has to be filtered by post_status and poster.

I was thinking it would be faster, at least for the first few pages, if mysql could be made to FIRST sort the data via it's index by post_time and then start picking out the distinct topic_id for the user in descending order. I guess that would require a subquery and not sure a 50k result subquery would be any better, still needs a temporary table.

Of course a fundamental solution would be to augment the core design so that there is another table which just stores the max post_time for each user in each topic but this is far too big of a change unless no other solution can be found.

Thank you for any suggestions!


adding realworld example and EXPLAIN:

slow log

# Query_time: 2.751334  Lock_time: 0.000056 Rows_sent: 40  Rows_examined: 48286
SELECT   p.*, MAX(post_time) as post_time FROM forum_posts AS p   WHERE p.poster_id = '2' AND p.post_status = '0' GROUP BY p.topic_id  ORDER BY post_time DESC LIMIT 7000, 40;

explain

select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
SIMPLE          p   ref poster_time poster_time 4   const   27072   Using where; Using temporary; Using filesort

Solution

  • First, fix your query to provide determinate results:

    SELECT p.topic_id, 
           MAX(post_time) as post_time 
    FROM forum_posts AS p   
    WHERE p.poster_id = '1' AND p.post_status = '0' 
    GROUP BY p.topic_id  
    ORDER BY post_time DESC 
      LIMIT 50 ;
    

    Then try it after adding an index on (post_status, poster_id, topic_id, post_time).