sql sql-server query-optimization sql-execution-plan

Why does the query optimer choose a serial plan when filter made less restrictive

I'm working through Iztik Ben-Gan book T-SQL Querying. I have some trouble understanding why making a filter less restrictive will make the optimzer move away from a parallel plan to a serial one.

The first query goes for a parallel plan, but the second goes for a serial one.

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=30000

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=490000

Both queries use this index in for a Clustered index scans :

CREATE CLUSTERED INDEX [idx_cl_od] ON [dbo].[Orders]
(
    [orderdate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

The table has 1000000 rows with evenly distributed orderid's.

Anyone who knows the reason why?

Paralel plan with high rows numbers.

Solution

Parallelism has both costs and benefits.

The more rows that survive the filter the more have to be packeted up and sent across the exchange to the serial thread. This imposes a cost and there will be a tipping point where the serial plan is costed cheaper - i.e. the scale down in CPU costs for the plan operators running in parallel is outweighed by the additional cost of the extra parallelism operator(s).

The higher the DOP (degree of parallelism) or the narrower the estimated row size then the higher the tipping point will be here (in terms of estimated number of rows returned)

Let's assume the following table structure and example data...

CREATE TABLE [dbo].[Orders](
    [orderid] [int] NOT NULL,
    [custid] [int] NOT NULL,
    [empid] [int] NOT NULL,
    [shipperid] [int] NOT NULL,
    [orderdate] [datetime] NOT NULL  INDEX [idx_cl_od] CLUSTERED,
    [filler] [varchar](8000) NOT NULL
) 

/*Generate some dummy rows of data*/
INSERT INTO dbo.Orders
SELECT value as [orderid], 
       1 as [custid], 
       1 as [empid], 
       1 as [shipperid], 
       getdate() as [orderdate], 
       'abcdefgh' as [filler]
FROM generate_series(1,10000000);

CREATE STATISTICS s_orderid ON dbo.Orders(orderid) WITH FULLSCAN;

On my machine with the above setup I see the tipping point being around 167.7K rows (with less than that I get a parallel plan with more than that I get a serial plan)

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/ 
AND orderid <=167715; 

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/ 
AND orderid <=167716;

My machine has an available DOP of 4. If I add an OPTION(MAXDOP 2) I see a different tipping point of 55837 vs 55838 rows.

Why these tipping points?

The cost of the serial plan being compared is not sensitive to the selectivity of the filter in this case.

Whether I am using OrderId of 1 or 10,000,000 I see the same overall plan cost of ~ 75.54 for the following query

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/ 
AND orderid <=10000000
OPTION (MAXDOP 1, QUERYTRACEON 9130) ;

Comprised of the following elements

Clustered Index Scan I/O Cost 59.7402
Clustered Index Scan CPU Cost 11.0002
Filter operator CPU Cost 4.8

If I try forcing a parallel plan instead with a different hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4, QUERYTRACEON 9130) I do see a different plan and some more variety in costs.

OrderId	Gather Streams CPU Cost	Filter CPU Cost	Clustered Index Scan CPU Cost	Clustered Index Scan IO Cost	Total
1	0.0287819	1.2	2.75004	59.7402	63.7190219
55837	3.96423	1.2	2.75004	59.7402	67.65447
55838	3.9643	1.2	2.75004	59.7402	67.65454
100000	7.07711	1.2	2.75004	59.7402	70.76735
167715	11.8501	1.2	2.75004	59.7402	75.54034
167716	11.8502	1.2	2.75004	59.7402	75.54044
200000	14.1257	1.2	2.75004	59.7402	77.81594

The above results show that the clustered index scan IO cost remains the same in the serial vs parallel plan and the CPU costs for the filter and clustered index scan were scaled down by a factor of 4 (reflecting the DOP) - 15.8002 * .75 is 11.85015 so the parallel plan is costed as cheaper until the row count going into the gather streams operator is enough for the operator cost to exceed that.

With hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 2, QUERYTRACEON 9130) the equivalent table is

OrderId	Gather Streams CPU Cost	Filter CPU Cost	Clustered Index Scan CPU Cost	Clustered Index Scan IO Cost	Total
1	0.0287819	2.4	5.50008	59.7402	67.6690619
55837	7.89997	2.4	5.50008	59.7402	75.54025
55838	7.90011	2.4	5.50008	59.7402	75.54039
100000	14.1257	2.4	5.50008	59.7402	81.76598
167715	23.6717	2.4	5.50008	59.7402	91.31198
167716	23.6718	2.4	5.50008	59.7402	91.31208
200000	28.223	2.4	5.50008	59.7402	95.86328

In this case the CPU costs for the operators running in parallel are only scaled down by 2 rather than 4.

15.8002 * .5 is 7.90010 so this is the tipping point for the gather streams operator cost before the serial plan looks cheaper.

The gather streams operator costs are also only scaled down by 2x rather than 4x so these are also larger than shown in the DOP 4 table for the equivalent row counts.

Additional Notes

The reason for using trace flag 9130 above is so we see the costs the optimiser uses when comparing plans and don't lose information about the costs for the filter in a post optimization rewrite.
The estimated row size for this setup is very wide (4035 bytes) due to the Filler column being varchar(8000). If the estimated row size is reduced the estimated costs for the Gather Streams operator for a given estimated row count will similarly reduce meaning the tipping point will be higher.