sqlsql-order-bywindow-functions

Does order by in a window function add more frames?


Using :

sum(Age) over(partition by Country) as Total
Id          FirstName       LastName   Age         Country         Total      
----------- --------------- ---------- ----------- --------------- -----------
          5 Betty           Doe                 28 UAE                      28
          6 Fred            XXX                  1 UK                    99731
          7 Angela          XXX                 37 UK                    99731
          8 Dave            XXX               2000 UK                    99731
          9 Tony            YYY               9000 UK                    99731
         10 Tony            ZZZ                323 UK                    99731
         11 Pete            ZZZ              88323 UK                    99731
          3 David           Robinson            22 UK                    99731
          4 John            Reinhardt           25 UK                    99731
          1 John            Doe                 31 USA                      53
          2 Robert          Luna                22 USA                      53

I get what I expected. Which is for all the rows to be split into 3 "frames" (UAE, UK and USA), and the sum of the ages in each frame computed.

However with :

sum(Age) over(partition by Country order by LastName) as Total
Id          FirstName       LastName   Age         Country         Total      
----------- --------------- ---------- ----------- --------------- -----------
          5 Betty           Doe                 28 UAE                      28
          4 John            Reinhardt           25 UK                       25
          3 David           Robinson            22 UK                       47
          6 Fred            XXX                  1 UK                     2085
          7 Angela          XXX                 37 UK                     2085
          8 Dave            XXX               2000 UK                     2085
          9 Tony            YYY               9000 UK                    11085
         10 Tony            ZZZ                323 UK                    99731
         11 Pete            ZZZ              88323 UK                    99731
          1 John            Doe                 31 USA                      31
          2 Robert          Luna                22 USA                      53

I am not entirely sure what is happening.

It looks as though it is using the combination of "County" and "LastName" to create multiple frames, so the UK ones look to be

UK,Reinhardt UK,Robinson UK,XXX UK,YYY UK,ZZZ

And then it's computing the total in each of these frames. Almost like a GroupBy with 2 parameters, Group by Country, LastName and then summing these up as it goes through each row.

If that is the case, what is exactly happening with the following? and why doesn't it sum up each frame like it was doing above?

sum(Age) over(partition by Country, LastName order by FirstName) as Total
Id          FirstName       LastName   Age         Country         Total      
----------- --------------- ---------- ----------- --------------- -----------
          5 Betty           Doe                 28 UAE                      28
          4 John            Reinhardt           25 UK                       25
          3 David           Robinson            22 UK                       22
          7 Angela          XXX                 37 UK                       37
          8 Dave            XXX               2000 UK                     2037
          6 Fred            XXX                  1 UK                     2038
          9 Tony            YYY               9000 UK                     9000
         11 Pete            ZZZ              88323 UK                    88323
         10 Tony            ZZZ                323 UK                    88646
          1 John            Doe                 31 USA                      31
          2 Robert          Luna                22 USA                      22

Below is the test code

-- create
CREATE TABLE Customer (
  Id int,
  FirstName varchar(15),
  LastName varchar(10),
  Age int,
  Country varchar(15)
);

-- insert
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (1, 'John', 'Doe', 31, 'USA');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (2, 'Robert', 'Luna', 22, 'USA');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (3, 'David', 'Robinson', 22, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (4, 'John', 'Reinhardt', 25, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (5, 'Betty', 'Doe', 28, 'UAE');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (6, 'Fred', 'XXX', 1, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (7, 'Angela', 'XXX', 37, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (8, 'Dave', 'XXX', 2000, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (9, 'Tony', 'YYY', 9000, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (10, 'Tony', 'ZZZ', 323, 'UK');
INSERT INTO Customer(Id,FirstName,LastName,Age,Country) VALUES (11, 'Pete', 'ZZZ', 88323, 'UK');


SELECT 
    Id,
    FirstName,
    LastName,
    Age,
    Country,
    sum(Age) over(partition by Country) as Total
    --sum(Age) over(partition by Country order by LastName) as Total
    --sum(Age) over(partition by Country, LastName order by FirstName) as Total
FROM Customer

Solution

  • Adding order by LastName returns not separate frames, but a cumulative sum of all rows up to (and including) the current row (using LastName order) within each frame.

    With partition by Country, LastName order by FirstName you are creating a frame for each Country, LastName combo, and returning the cumulative sum (using FirstName order) within each frame.