sql-servert-sqlwindow-functionsleadsql-server-2022

Is this a bug with LEAD() IGNORE NULLS?


I was testing and comparing various forms of the LAST(), LEAD(), FIRST_VALUE() and LAST_VALUE() window functions and appear to have found a bug with LEAD() IGNORE NULLS.

Instead of the expected results, it seems to return the same values as LAG() IGNORE NULLS. I would expect it to return the same as FIRST_VALUE(Value) IGNORE NULLS OVER(ORDER BY Id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)

Am I correct to consider this a bug, or am I missing something? If a bug, what is the best place to report it?

-- With nulls
CREATE TABLE T (Id INT IDENTITY(1,1), Value INT)
INSERT T
VALUES
    (NULL), (11), (22), (33),
    (NULL), (44), (55), (66),
    (NULL), (77), (88), (99),
    (NULL)

SELECT *
    , LAG(Value) OVER(ORDER BY Id) AS Lag
    , LAG(Value) IGNORE NULLS OVER(ORDER BY Id) AS LagI
    , LEAD(Value) OVER(ORDER BY Id) AS Lead
    , LEAD(Value) IGNORE NULLS OVER(ORDER BY Id) AS [LeadI ???]
    , FIRST_VALUE(Value) IGNORE NULLS
          OVER(
              ORDER BY Id
              ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING -- Strictly following
          )
          AS [FirstISF (= expected LeadI)]
FROM T
ORDER BY T.Id

Results:

Id Value Lag LagI Lead LeadI ??? FirstISF (= expected LeadI)
1 null null null 11 null 11
2 11 null null 22 null 22
3 22 11 11 33 11 33
4 33 22 22 null 22 44
5 null 33 33 44 33 44
6 44 null 33 55 33 55
7 55 44 44 66 44 66
8 66 55 55 null 55 77
9 null 66 66 77 66 77
10 77 null 66 88 66 88
11 88 77 77 99 77 99
12 99 88 88 null 88 null
13 null 99 99 null 99 null

The above query (and a few others) are available in this db<>fiddle.


Solution

  • This is a bug - but looks like it has now been fixed (presumably 2278800 in CU4. The @@VERSION reported by DB Fiddle is 16.0.4080.1 - which is the RTM version of SQL Server 2022 with no bugfixes.

    On DB Fiddle for the following query...

    SELECT  LAG(Value) IGNORE NULLS OVER(ORDER BY Id) AS LagI,
            LEAD(Value) IGNORE NULLS OVER(ORDER BY Id) AS [LeadI ???]
    FROM T
    ORDER BY T.Id
    

    The execution plan is

     |--Stream Aggregate(GROUP BY:([WindowCount1009]) DEFINE:([Expr1003]=LAST_VALUE_IGNORE_NULLS([dbo].[T].[Value]), [Expr1004]=LAST_VALUE_IGNORE_NULLS([dbo].[T].[Value]), [dbo].[T].[Id]=ANY([dbo].[T].[Id])))
           |--Window Spool(ROWS BETWEEN:(UNBOUNDED, [BottomRowNumber1006]))
                |--Segment
                     |--Compute Scalar(DEFINE:([BottomRowNumber1006]=[RowNumber1005]-(1)))
                          |--Sequence Project(DEFINE:([RowNumber1005]=row_number))
                               |--Segment
                                    |--Sort(ORDER BY:([dbo].[T].[Id] ASC))
                                         |--Table Scan(OBJECT:([dbo].[T]))
    

    It only has one sort (Id Ascending) and one window spool etc.

    In CU 8 (returns correct results) the execution plan calculates both window functions separately (with both an ascending and a descending sort) and the plan has twice as many operators.


      |--Sort(ORDER BY:([dbo].[T].[Id] ASC))
           |--Stream Aggregate(GROUP BY:([WindowCount1022]) DEFINE:([Expr1004]=LAST_VALUE_IGNORE_NULLS([dbo].[T].[Value]), [dbo].[T].[Id]=ANY([dbo].[T].[Id]), [Expr1003]=ANY([Expr1003])))
                |--Window Spool(ROWS BETWEEN:(UNBOUNDED, [BottomRowNumber1016]))
                     |--Segment
                          |--Compute Scalar(DEFINE:([BottomRowNumber1016]=[RowNumber1015]-(1)))
                               |--Sequence Project(DEFINE:([RowNumber1015]=row_number))
                                    |--Segment
                                         |--Sort(ORDER BY:([dbo].[T].[Id] DESC))
                                              |--Stream Aggregate(GROUP BY:([WindowCount1019]) DEFINE:([Expr1003]=LAST_VALUE_IGNORE_NULLS([dbo].[T].[Value]), [dbo].[T].[Id]=ANY([dbo].[T].[Id]), [dbo].[T].[Value]=ANY([dbo].[T].[Value])))
                                                   |--Window Spool(ROWS BETWEEN:(UNBOUNDED, [BottomRowNumber1014]))
                                                        |--Segment
                                                             |--Compute Scalar(DEFINE:([BottomRowNumber1014]=[RowNumber1013]-(1)))
                                                                  |--Sequence Project(DEFINE:([RowNumber1013]=row_number))
                                                                       |--Segment
                                                                            |--Sort(ORDER BY:([dbo].[T].[Id] ASC))
                                                                                 |--Table Scan(OBJECT:([dbo].[T]))
    

    The above execution plan does actually have three sorts. One id asc, then id desc, then id asc for the final order by.

    This can be reduced to two by reversing the order of the columns in the SELECT list

    SELECT  LEAD(Value) IGNORE NULLS OVER(ORDER BY Id) AS [LeadI ???],
            LAG(Value) IGNORE NULLS OVER(ORDER BY Id) AS LagI
    FROM T
    ORDER BY T.Id