mysql

Why is MySQL Sorting by mp.id in This Query Execution Plan?


I am running the following SQL query in MySQL 8.0:

SELECT mp.id
FROM meme_post mp
JOIN meme_post_tag mpt ON mp.id = mpt.meme_post_id
WHERE deleted_at IS NULL AND mp.media_type = 'STATIC' AND mpt.tag_id IN (11, 30, 24) -- Selected tag IDs
GROUP BY mp.id
HAVING COUNT(DISTINCT mpt.tag_id) = 3 
ORDER BY mp.created_at DESC
LIMIT 21 OFFSET 0; 

The EXPLAIN output is:

    1   SIMPLE  mp      ref PRIMARY,FK_meme_post_user,idx_deleted_at_created_at idx_deleted_at_created_at   9   const   166682  50.00   Using index condition; Using where; Using temporary; Using filesort
1   SIMPLE  mpt     ref idx_meme_post_tag_post_tag,FK_meme_post_tag_tag,FK_meme_post_tag_post_id    idx_meme_post_tag_post_tag  8   findmymeme_db.mp.id 2   75.35   Using where; Using index

And the execution plan details:

    -> Limit: 21 row(s)  (actual time=1571..1571 rows=21 loops=1)
    -> Sort: mp.created_at DESC  (actual time=1571..1571 rows=21 loops=1)
        -> Filter: (`count(distinct mpt.tag_id)` = 3)  (actual time=802..1566 rows=8238 loops=1)
            -> Stream results  (cost=182898 rows=333364) (actual time=802..1556 rows=219023 loops=1)
                -> Group aggregate: count(distinct mpt.tag_id)  (cost=182898 rows=333364) (actual time=802..1411 rows=219023 loops=1)
                    -> Nested loop inner join  (cost=145301 rows=375967) (actual time=802..1343 rows=322453 loops=1)
                        -> Sort: mp.id  (cost=11577 rows=166682) (actual time=802..832 rows=249318 loops=1)
                            -> Filter: (mp.media_type = 'STATIC')  (cost=11577 rows=166682) (actual time=0.206..642 rows=249318 loops=1)
                                -> Index lookup on mp using idx_deleted_at_created_at (deleted_at=NULL), with index condition: (mp.deleted_at IS NULL)  (cost=11577 rows=166682) (actual time=0.204..615 rows=332220 loops=1)
                        -> Filter: (mpt.tag_id IN (11,30,24))  (cost=1.01 rows=2.26) (actual time=0.0015..0.00191 rows=1.29 loops=249318)
                            -> Covering index lookup on mpt using idx_meme_post_tag_post_tag (meme_post_id=mp.id)  (cost=1.01 rows=2.99) (actual time=0.00129..0.00169 rows=3 loops=249318)

Additional Information: The index idx_meme_post_tag_post_tag is unique on (meme_post_id, tag_id). My Question: I noticed that MySQL is sorting by mp.id. However, starting from MySQL 8.0, GROUP BY no longer implicitly sorts the results. Then why is MySQL still sorting?

My assumption is that this sorting is happening because of the JOIN. The mpt table is indexed and stored in order of meme_post_id, and since we are performing a JOIN with mp, MySQL sorts mp.id to optimize the nested loop join.

Is this correct? Is MySQL choosing to sort mp.id explicitly to improve the efficiency of the nested loop join?

Any insights would be greatly appreciated!


Solution

  • In order to answer your question with 100% certainty one would need to debug MySQL or have under-the-hood information. Which I have not and probably readers of this question will also have not. Especially since you did not share the show create table command's result of the tables involved.

    But there is a way to figure it out.

    First, check your MySQL version. It may be lower than 8.0 and then your premise was wrong and that's the answer.

    If it's newer than that, then the next thing is to see where mp.id appears inside the query. There are exactly three places:

    So let's find out which the culprit is. We can exclude the select clause from being a culprit, because it's just a statement that I need this field in memory to fill my result set.

    So it's the join condition or the group by (not extensive or, of course).

    So, in order to figure this out, take a sample of mp.id values and run the explain for this kind of query:

    SELECT mp.id
    FROM meme_post mp
    JOIN meme_post_tag mpt ON mpt.meme_post_id in (<your mp.id values here>)
    WHERE deleted_at IS NULL AND mp.media_type = 'STATIC' AND mpt.tag_id IN (11, 30, 24) -- Selected tag IDs
    GROUP BY mp.id
    HAVING COUNT(DISTINCT mpt.tag_id) = 3 
    ORDER BY mp.created_at DESC
    LIMIT 21 OFFSET 0; 
    

    If it's not sorting it, then the experiment ends and the answer is that it is because of the join condition.

    If it's sorting it, then we know that group by causes it, but we don't know whether the join condition also causes it. So, run the query without the group by and having clause with mp.id being back in the join condition. If it's sorting it, then the join condition is also a cause, but if not, then it's only the group by that's causing it.