I’m a new SQL learner and a newbie to StackOverflow. Hope I didn't miss anything important for a first-time post.
I happened to get two following queries from my instructor saying they have different performance. But I couldn’t see why they are different in terms of the logic and computation cost.
Query 1:
SELECT First_Name,
SUM(total_sales_amount) AS sub_total_sales_amount FROM
(
select A.First_Name, C.product_quantity * D.Retail_Price AS t otal_sales_amount From join_demo.customer as A
inner join join_demo.customer_order as B on A.customer_id = B.customer_id
inner join join_demo.order_details C on B.order_id = C.order_id
inner join join_demo.product as D on C.product_id= D.product_id
) E
GROUP BY 1
ORDER BY sub_total_sales_amount DESC LIMIT 1;
Query 2 (I was told this one has better performance):
SELECT A.First_Name, SUM(C.product_quantity * D.Retail_Price) AS sub_total_sales_amount
From join_demo.customer as A
inner join join_demo.customer_order as B on A.customer_id = B.customer_id
inner join join_demo.order_details C on B.order_id = C.order_id
inner join join_demo.product as D on C.product_id= D.product_id GROUP BY 1
ORDER BY sub_total_sales_amount DESC LIMIT 1;
I’m running MySQL on my local Mac. But I suppose this one would be a general question regarding to SQL performance tuning. Could someone please shed light on this question? Much appreciated!
Updated:
Thanks @Tim and @MatBailie. I added EXPLAIN
before each query.
The results are exactly the same. I guess two queries are on the same level of performance.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | A | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using temporary; Using filesort |
1 | SIMPLE | B | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25 | Using where; Using join buffer (hash join) |
1 | SIMPLE | C | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20 | Using where; Using join buffer (hash join) |
1 | SIMPLE | D | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20 | Using where; Using join buffer (hash join) |
Old versions of MySQL used to automatically materialize derived tables (subqueries in the FROM
clause). "Materialize" means that MySQL runs the subquery and saves the results in a temporary location (in this case, before doing the aggregation).
I think the optimizer was improved starting with version 5.7 (although the history may be wrong). Nowadays, MySQL is smarter about materialization and will generally merge a subquery with the outer query.
Hence, more recent versions of MySQL should produce the same execution plan. Of course, optimizers can be confused and the optimizer may decide to materialize the subquery, which would slow down the query under most circumstances.
You can read more about this in the documentation.
You should also learn to use meaningful table aliases, such as c
for customers
. And, qualify all column references so it is clear where the columns come from. Arbitrary letters are probably worse than no aliases at all (assuming the columns are all qualified).