sql-serverover-clause

Windowed Functions and Query Optimizer


I have a table with the following structure.

|anId|     aDate|aNumber|
-------------------------
|   1|2018-01-20|    100|
|   1|2019-01-01|   -100|
|   1|2019-02-01|     10|
|   2|2019-01-02|     40|

I have a query to return, on a specific date, whether or not previous (inclusive) aNumbers sum is > 0 for each .

select
  anId,
  aDate,
  1 as aStatus
from (
  select
    anId,
    aDate,
    sum(aNumber) OVER (
      PARTITION BY anId
      ORDER BY aDate
      ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING
    ) as aSum
  from
    myTable
)
where
  aSum > 0
;

So this query would return

|anId|     aDate|aStatus|
-------------------------
|   1|2018-01-20|      1|
|   2|2019-01-02|      1|
|   1|2019-02-01|      1|

Now I've turned the query into a view myView. I'd like to query this view for date ranges. I may query the table daily/monthly/yearly whatever, but I want to be able to export the query results from one date range, and then export/append the results for the next date range.

select
  anId,
  aDate,
  aStatus
from
  myView
where
  aDate between (2018-01-01) and (2018-12-31)
;

Would return

|anId|     aDate|aStatus|
-------------------------
|   1|2018-01-20|      1|

And the next year

select
  anId,
  aDate,
  aStatus
from
  myView
where
  aDate between (2019-01-01) and (2019-12-31)
;

Should return

|anId|     aDate|aStatus|
-------------------------
|   2|2019-01-02|      1|
|   1|2019-02-01|      1|

Allowing me to stitch together the results to get the original, unfiltered, view records.


Ok, now that the stage is set, my concern with this approach is that when I filter the date from the view, it will impact the windowed function.

When I filter on 2019, will the windowed sum still include the 2018 aNumbers? Will my date range filter be applied in the inner select, prior to the sum?


Solution

  • After creating this question, I realized it should be simple enough to test it.

    CREATE TABLE [dbo].[myTable](
        [anId] [char](36) NOT NULL,
        [aDate] [datetime2](7) NULL,
        [aNumber] [int] NULL
    ) ON [PRIMARY]
    GO
    
    insert into myTable(anId,aDate,aNumber) values ('1','2018-01-20',100);
    insert into myTable(anId,aDate,aNumber) values ('1','2019-01-01',-100);
    insert into myTable(anId,aDate,aNumber) values ('1','2019-02-01',10);
    insert into myTable(anId,aDate,aNumber) values ('2','2019-01-20',40);
    

    Using a sub-select instead of creating the actual view

    select
        *
    from (
        select
          anId,
          aDate,
          1 as aStatus
        from (
          select
            anId,
            aDate,
            sum(aNumber) OVER (
              PARTITION BY anId
              ORDER BY aDate
              ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING
            ) as aSum
          from
            myTable
        ) a
        where
          a.aSum > 0
    ) b
    where
        b.aDate < '2019-01-01'
    ;
    

    Returns:

    |anId|     aDate|aStatus|
    -------------------------
    |   1|2018-01-20|      1|
    

    And

    select
        *
    from (
        select
          anId,
          aDate,
          1 as aStatus
        from (
          select
            anId,
            aDate,
            sum(aNumber) OVER (
              PARTITION BY anId
              ORDER BY aDate
              ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING
            ) as aSum
          from
            myTable
        ) a
        where
          a.aSum > 0
    ) b
    where
        b.aDate >= '2019-01-01'
    ;
    

    Returns:

    |anId|     aDate|aStatus|
    -------------------------
    |   2|2019-01-02|      1|
    |   1|2019-02-01|      1|
    

    This confirms that the date filter doesn't impact the sums. However it leaves me with some concern that the subquery is sub-optimal, as it may be running sums across significantly more data than necessary. ie. when I want 2018 data, is it still calculating sums for 2019 data?