I am running the following SQL query in MySQL 8.0:
SELECT mp.id
FROM meme_post mp
JOIN meme_post_tag mpt ON mp.id = mpt.meme_post_id
WHERE deleted_at IS NULL AND mp.media_type = 'STATIC' AND mpt.tag_id IN (11, 30, 24) -- Selected tag IDs
GROUP BY mp.id
HAVING COUNT(DISTINCT mpt.tag_id) = 3
ORDER BY mp.created_at DESC
LIMIT 21 OFFSET 0;
The EXPLAIN output is:
1 SIMPLE mp ref PRIMARY,FK_meme_post_user,idx_deleted_at_created_at idx_deleted_at_created_at 9 const 166682 50.00 Using index condition; Using where; Using temporary; Using filesort
1 SIMPLE mpt ref idx_meme_post_tag_post_tag,FK_meme_post_tag_tag,FK_meme_post_tag_post_id idx_meme_post_tag_post_tag 8 findmymeme_db.mp.id 2 75.35 Using where; Using index
And the execution plan details:
-> Limit: 21 row(s) (actual time=1571..1571 rows=21 loops=1)
-> Sort: mp.created_at DESC (actual time=1571..1571 rows=21 loops=1)
-> Filter: (`count(distinct mpt.tag_id)` = 3) (actual time=802..1566 rows=8238 loops=1)
-> Stream results (cost=182898 rows=333364) (actual time=802..1556 rows=219023 loops=1)
-> Group aggregate: count(distinct mpt.tag_id) (cost=182898 rows=333364) (actual time=802..1411 rows=219023 loops=1)
-> Nested loop inner join (cost=145301 rows=375967) (actual time=802..1343 rows=322453 loops=1)
-> Sort: mp.id (cost=11577 rows=166682) (actual time=802..832 rows=249318 loops=1)
-> Filter: (mp.media_type = 'STATIC') (cost=11577 rows=166682) (actual time=0.206..642 rows=249318 loops=1)
-> Index lookup on mp using idx_deleted_at_created_at (deleted_at=NULL), with index condition: (mp.deleted_at IS NULL) (cost=11577 rows=166682) (actual time=0.204..615 rows=332220 loops=1)
-> Filter: (mpt.tag_id IN (11,30,24)) (cost=1.01 rows=2.26) (actual time=0.0015..0.00191 rows=1.29 loops=249318)
-> Covering index lookup on mpt using idx_meme_post_tag_post_tag (meme_post_id=mp.id) (cost=1.01 rows=2.99) (actual time=0.00129..0.00169 rows=3 loops=249318)
Additional Information: The index idx_meme_post_tag_post_tag is unique on (meme_post_id, tag_id). My Question: I noticed that MySQL is sorting by mp.id. However, starting from MySQL 8.0, GROUP BY no longer implicitly sorts the results. Then why is MySQL still sorting?
My assumption is that this sorting is happening because of the JOIN. The mpt table is indexed and stored in order of meme_post_id, and since we are performing a JOIN with mp, MySQL sorts mp.id to optimize the nested loop join.
Is this correct? Is MySQL choosing to sort mp.id explicitly to improve the efficiency of the nested loop join?
Any insights would be greatly appreciated!
In order to answer your question with 100% certainty one would need to debug MySQL or have under-the-hood information. Which I have not and probably readers of this question will also have not. Especially since you did not share the show create table
command's result of the tables involved.
But there is a way to figure it out.
First, check your MySQL version. It may be lower than 8.0 and then your premise was wrong and that's the answer.
If it's newer than that, then the next thing is to see where mp.id
appears inside the query. There are exactly three places:
select
clausejoin
conditiongroup by
clauseSo let's find out which the culprit is. We can exclude the select
clause from being a culprit, because it's just a statement that I need this field in memory to fill my result set.
So it's the join
condition or the group by
(not extensive or, of course).
So, in order to figure this out, take a sample of mp.id
values and run the explain
for this kind of query:
SELECT mp.id
FROM meme_post mp
JOIN meme_post_tag mpt ON mpt.meme_post_id in (<your mp.id values here>)
WHERE deleted_at IS NULL AND mp.media_type = 'STATIC' AND mpt.tag_id IN (11, 30, 24) -- Selected tag IDs
GROUP BY mp.id
HAVING COUNT(DISTINCT mpt.tag_id) = 3
ORDER BY mp.created_at DESC
LIMIT 21 OFFSET 0;
If it's not sorting it, then the experiment ends and the answer is that it is because of the join
condition.
If it's sorting it, then we know that group by
causes it, but we don't know whether the join
condition also causes it. So, run the query without the group by
and having
clause with mp.id
being back in the join
condition. If it's sorting it, then the join
condition is also a cause, but if not, then it's only the group by
that's causing it.