I'm working through Iztik Ben-Gan book T-SQL Querying. I have some trouble understanding why making a filter less restrictive will make the optimzer move away from a parallel plan to a serial one.
The first query goes for a parallel plan, but the second goes for a serial one.
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=30000
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=490000
Both queries use this index in for a Clustered index scans :
CREATE CLUSTERED INDEX [idx_cl_od] ON [dbo].[Orders]
(
[orderdate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
The table has 1000000 rows with evenly distributed orderid's.
Anyone who knows the reason why?
Paralel plan with high rows numbers.
Parallelism has both costs and benefits.
The more rows that survive the filter the more have to be packeted up and sent across the exchange to the serial thread. This imposes a cost and there will be a tipping point where the serial plan is costed cheaper - i.e. the scale down in CPU costs for the plan operators running in parallel is outweighed by the additional cost of the extra parallelism operator(s).
The higher the DOP (degree of parallelism) or the narrower the estimated row size then the higher the tipping point will be here (in terms of estimated number of rows returned)
Let's assume the following table structure and example data...
CREATE TABLE [dbo].[Orders](
[orderid] [int] NOT NULL,
[custid] [int] NOT NULL,
[empid] [int] NOT NULL,
[shipperid] [int] NOT NULL,
[orderdate] [datetime] NOT NULL INDEX [idx_cl_od] CLUSTERED,
[filler] [varchar](8000) NOT NULL
)
/*Generate some dummy rows of data*/
INSERT INTO dbo.Orders
SELECT value as [orderid],
1 as [custid],
1 as [empid],
1 as [shipperid],
getdate() as [orderdate],
'abcdefgh' as [filler]
FROM generate_series(1,10000000);
CREATE STATISTICS s_orderid ON dbo.Orders(orderid) WITH FULLSCAN;
On my machine with the above setup I see the tipping point being around 167.7K rows (with less than that I get a parallel plan with more than that I get a serial plan)
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/
AND orderid <=167715;
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/
AND orderid <=167716;
My machine has an available DOP of 4. If I add an OPTION(MAXDOP 2)
I see a different tipping point of 55837
vs 55838
rows.
Why these tipping points?
The cost of the serial plan being compared is not sensitive to the selectivity of the filter in this case.
Whether I am using OrderId of 1
or 10,000,000
I see the same overall plan cost of ~ 75.54
for the following query
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROM dbo.Orders
WHERE 1=1 /*Prevent simple parameterisation*/
AND orderid <=10000000
OPTION (MAXDOP 1, QUERYTRACEON 9130) ;
Comprised of the following elements
If I try forcing a parallel plan instead with a different hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4, QUERYTRACEON 9130)
I do see a different plan and some more variety in costs.
OrderId | Gather Streams CPU Cost | Filter CPU Cost | Clustered Index Scan CPU Cost | Clustered Index Scan IO Cost | Total |
---|---|---|---|---|---|
1 | 0.0287819 | 1.2 | 2.75004 | 59.7402 | 63.7190219 |
55837 | 3.96423 | 1.2 | 2.75004 | 59.7402 | 67.65447 |
55838 | 3.9643 | 1.2 | 2.75004 | 59.7402 | 67.65454 |
100000 | 7.07711 | 1.2 | 2.75004 | 59.7402 | 70.76735 |
167715 | 11.8501 | 1.2 | 2.75004 | 59.7402 | 75.54034 |
167716 | 11.8502 | 1.2 | 2.75004 | 59.7402 | 75.54044 |
200000 | 14.1257 | 1.2 | 2.75004 | 59.7402 | 77.81594 |
The above results show that the clustered index scan IO cost remains the same in the serial vs parallel plan and the CPU costs for the filter and clustered index scan were scaled down by a factor of 4 (reflecting the DOP) - 15.8002 * .75
is 11.85015
so the parallel plan is costed as cheaper until the row count going into the gather streams operator is enough for the operator cost to exceed that.
With hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 2, QUERYTRACEON 9130)
the equivalent table is
OrderId | Gather Streams CPU Cost | Filter CPU Cost | Clustered Index Scan CPU Cost | Clustered Index Scan IO Cost | Total |
---|---|---|---|---|---|
1 | 0.0287819 | 2.4 | 5.50008 | 59.7402 | 67.6690619 |
55837 | 7.89997 | 2.4 | 5.50008 | 59.7402 | 75.54025 |
55838 | 7.90011 | 2.4 | 5.50008 | 59.7402 | 75.54039 |
100000 | 14.1257 | 2.4 | 5.50008 | 59.7402 | 81.76598 |
167715 | 23.6717 | 2.4 | 5.50008 | 59.7402 | 91.31198 |
167716 | 23.6718 | 2.4 | 5.50008 | 59.7402 | 91.31208 |
200000 | 28.223 | 2.4 | 5.50008 | 59.7402 | 95.86328 |
In this case the CPU costs for the operators running in parallel are only scaled down by 2 rather than 4.
15.8002 * .5 is 7.90010
so this is the tipping point for the gather streams operator cost before the serial plan looks cheaper.
The gather streams operator costs are also only scaled down by 2x
rather than 4x
so these are also larger than shown in the DOP 4
table for the equivalent row counts.
varchar(8000)
. If the estimated row size is reduced the estimated costs for the Gather Streams operator for a given estimated row count will similarly reduce meaning the tipping point will be higher.