I have multiple joins including left joins in mysql. There are two ways to do that.
I can put "ON" conditions right after each join:
select * from A join B ON(A.bid=B.ID) join C ON(B.cid=C.ID) join D ON(c.did=D.ID)
I can put them all in one "ON" clause:
select * from A join B join C join D ON(A.bid=B.ID AND B.cid=C.ID AND c.did=D.ID)
Which way is better?
Is it different if I need Left join or Right join in my query?
For simple uses MySQL will almost inevitably execute them in the same manner, so it is a manner of preference and readability (which is a great subject of debate).
However with more complex queries, particularly aggregate queries with OUTER JOIN
s that have the potential to become disk and io bound - there may be performance and unseen implications in not using a WHERE clause with OUTER JOIN queries.
The difference between a query that runs for 8 minutes, or .8 seconds may ultimately depend on the WHERE
clause, particularly as it relates to indexes (How MySQL uses Indexes): The WHERE
clause is a core part of providing the query optimizer the information it needs to do it's job and tell the engine how to execute the query in the most efficient way.
From How MySQL Optimizes Queries using WHERE:
"This section discusses optimizations that can be made for processing WHERE clauses...The best join combination for joining the tables is found by trying all possibilities. If all columns in ORDER BY and GROUP BY clauses come from the same table, that table is preferred first when joining."
For each table in a join, a simpler WHERE is constructed to get a fast WHERE evaluation for the table and also to skip rows as soon as possible
Some examples:
Full table scans (type = ALL) with NO Using where
in EXTRA
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id AND cr.role = cr2.role AND cr.util = 1000
[Err] Out of memory
Uses where
to optimize results, with index (Using where
,Using index
):
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
WHERE cr.role = cr2.role
AND cr.util = 1000
515661 rows in set (0.124s)
****Combination of ON/WHERE - Same result - Same plan in EXPLAIN
*******
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
AND cr.role = cr2.role
WHERE cr.util = 1000
515661 rows in set (0.121s)
MySQL is typically smart enough to figure out simple queries like the above and will execute them similarly but in certain cases it will not.
Outer Join Query Performance:
As both LEFT JOIN and RIGHT JOIN are OUTER JOINS (Great in depth review here) the issue of the Cartesian product arises, the avoidance of Table Scans must be avoided, so that as many rows as possible not needed for the query are eliminated as fast as possible.
WHERE
, Indexes and the query optimizer used together may completely eliminate the problems posed by cartesian products when used carefully with aggregate functions like AVERAGE
, GROUP BY
, SUM
, DISTINCT
etc. orders of magnitude of decrease in run time is achieved with proper indexing by the user and utilization of the WHERE clause.
Finally
Again, for the majority of queries, the query optimizer will execute these in the same manner - making it a manner of preference but when query optimization becomes important, WHERE
is a very important tool. I have seen some performance increase in certain cases with INNER JOIN
by specifying an indexed col as an additional ON..AND ON clause but I could not tell you why.