sqlsql-serverquery-optimizationsql-execution-plan

Why does the query optimer choose a serial plan when filter made less restrictive


I'm working through Iztik Ben-Gan book T-SQL Querying. I have some trouble understanding why making a filter less restrictive will make the optimzer move away from a parallel plan to a serial one.

The first query goes for a parallel plan, but the second goes for a serial one.

SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=30000
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=490000

Both queries use this index in for a Clustered index scans :

CREATE CLUSTERED INDEX [idx_cl_od] ON [dbo].[Orders]
(
    [orderdate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

The table has 1000000 rows with evenly distributed orderid's.

Anyone who knows the reason why?

Paralel plan with high rows numbers.


Solution

  • Parallelism has both costs and benefits.

    The more rows that survive the filter the more have to be packeted up and sent across the exchange to the serial thread. This imposes a cost and there will be a tipping point where the serial plan is costed cheaper - i.e. the scale down in CPU costs for the plan operators running in parallel is outweighed by the additional cost of the extra parallelism operator(s).

    The higher the DOP (degree of parallelism) or the narrower the estimated row size then the higher the tipping point will be here (in terms of estimated number of rows returned)


    Let's assume the following table structure and example data...

    CREATE TABLE [dbo].[Orders](
        [orderid] [int] NOT NULL,
        [custid] [int] NOT NULL,
        [empid] [int] NOT NULL,
        [shipperid] [int] NOT NULL,
        [orderdate] [datetime] NOT NULL  INDEX [idx_cl_od] CLUSTERED,
        [filler] [varchar](8000) NOT NULL
    ) 
    
    /*Generate some dummy rows of data*/
    INSERT INTO dbo.Orders
    SELECT value as [orderid], 
           1 as [custid], 
           1 as [empid], 
           1 as [shipperid], 
           getdate() as [orderdate], 
           'abcdefgh' as [filler]
    FROM generate_series(1,10000000);
    
    CREATE STATISTICS s_orderid ON dbo.Orders(orderid) WITH FULLSCAN;
    

    On my machine with the above setup I see the tipping point being around 167.7K rows (with less than that I get a parallel plan with more than that I get a serial plan)

    SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
    FROM dbo.Orders
    WHERE 1=1 /*Prevent simple parameterisation*/ 
    AND orderid <=167715; 
    
    SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
    FROM dbo.Orders
    WHERE 1=1 /*Prevent simple parameterisation*/ 
    AND orderid <=167716; 
    

    enter image description here

    My machine has an available DOP of 4. If I add an OPTION(MAXDOP 2) I see a different tipping point of 55837 vs 55838 rows.

    Why these tipping points?

    The cost of the serial plan being compared is not sensitive to the selectivity of the filter in this case.

    Whether I am using OrderId of 1 or 10,000,000 I see the same overall plan cost of ~ 75.54 for the following query

    SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
    FROM dbo.Orders
    WHERE 1=1 /*Prevent simple parameterisation*/ 
    AND orderid <=10000000
    OPTION (MAXDOP 1, QUERYTRACEON 9130) ; 
    

    Comprised of the following elements

    If I try forcing a parallel plan instead with a different hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 4, QUERYTRACEON 9130) I do see a different plan and some more variety in costs.

    OrderId Gather Streams CPU Cost Filter CPU Cost Clustered Index Scan CPU Cost Clustered Index Scan IO Cost Total
    1 0.0287819 1.2 2.75004 59.7402 63.7190219
    55837 3.96423 1.2 2.75004 59.7402 67.65447
    55838 3.9643 1.2 2.75004 59.7402 67.65454
    100000 7.07711 1.2 2.75004 59.7402 70.76735
    167715 11.8501 1.2 2.75004 59.7402 75.54034
    167716 11.8502 1.2 2.75004 59.7402 75.54044
    200000 14.1257 1.2 2.75004 59.7402 77.81594

    The above results show that the clustered index scan IO cost remains the same in the serial vs parallel plan and the CPU costs for the filter and clustered index scan were scaled down by a factor of 4 (reflecting the DOP) - 15.8002 * .75 is 11.85015 so the parallel plan is costed as cheaper until the row count going into the gather streams operator is enough for the operator cost to exceed that.

    With hint OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), MAXDOP 2, QUERYTRACEON 9130) the equivalent table is

    OrderId Gather Streams CPU Cost Filter CPU Cost Clustered Index Scan CPU Cost Clustered Index Scan IO Cost Total
    1 0.0287819 2.4 5.50008 59.7402 67.6690619
    55837 7.89997 2.4 5.50008 59.7402 75.54025
    55838 7.90011 2.4 5.50008 59.7402 75.54039
    100000 14.1257 2.4 5.50008 59.7402 81.76598
    167715 23.6717 2.4 5.50008 59.7402 91.31198
    167716 23.6718 2.4 5.50008 59.7402 91.31208
    200000 28.223 2.4 5.50008 59.7402 95.86328

    In this case the CPU costs for the operators running in parallel are only scaled down by 2 rather than 4.

    15.8002 * .5 is 7.90010 so this is the tipping point for the gather streams operator cost before the serial plan looks cheaper.

    The gather streams operator costs are also only scaled down by 2x rather than 4x so these are also larger than shown in the DOP 4 table for the equivalent row counts.

    Additional Notes