sqlsql-serverquery-optimizationsql-execution-plan

DBMS optimizer - best execution plan, no matter the query's formulation


If one writes query in a relational DBMS, won't the optimizer choose the best way to execute it (depending on multiple factors) no matter how one formulates it?

I'm curious about SQL Server and Oracle.

For instance:

SELECT * 
FROM t1, t2
WHERE t1.some_column = t2.some_column

Provided the right indexes (with the right selectivity) exist, we should see index seeks followed maybe by key lookups. What we won't see is a cross product followed by a selection in the execution plan.

Then why does https://technet.microsoft.com/en-us/library/ms189575(v=sql.105).aspx state this?

In Transact-SQL, there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not. However, in some cases where existence must be checked, a join yields better performance.

No matter how you write a query and no matter its query class (SPJ, SPJ + UNION, SPJ + subqueries, etc), won't the optimizer find the best semantically equivalent version?


Solution

  • won't the optimizer choose the best way to execute it (depending on multiple factors) no matter how one formulates Q?

    I would like to quote Itzik Ben-Gan words from this book : Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

    There are several reasons for this.

    For one, SQL Server’s optimizer is not perfect. I don’t want to sound unappreciative—SQL Server’s optimizer is truly a marvel when you think of what this software component can achieve. But it’s a fact that it doesn’t have all possible optimization rules encoded within it.

    Two, the optimizer has to limit the amount of time spent on optimization; otherwise, it could spend a much longer time optimizing a query than the amount of time the optimization shaves off from the run time of the query.

    The situation could be as absurd as producing a plan in a matter of several dozen milliseconds without going over all possible plans and getting a run time of only seconds, but producing all possible plans in hopes of shaving off a couple of seconds might take a year or even several. You can see that, for practical reasons, the optimizer needs to limit the time spent on optimization.

    Based on factors like the sizes of the tables involved in the query, SQL Server calculates two values: one is a cost consid- ered good enough for the query, and the other is the maximum amount of time to spend on optimization before stopping. If either threshold is reached, optimization stops, and SQL Server uses the best plan found at that point.

    In summary there are few statements that are optimized,few not