sqlpostgresqlwindow-functions

In PostgreSQL do ranking window functions heed the window frame or act on the entire partition?


I am learning window functions, primarily with this page of the docs. I am trying to categorize the window functions according to whether they heed window frames, or ignore them and act on the partition.

This page of the docs clearly states first_value(), last_value(), and nth_value(), as well as aggregate functions, only consider the window frame. However, the docs don't say as directly whether the ranking functions (i.e. rank(), dense_rank(), percent_rank(), and cume_dist()) heed or ignore window frames.

My educated guess is that these ranking functions, along with all other window functions listed in Table 9.65 (excluding first_value(), last_value(), and nth_value(), and aggregate window functions) ignore window frames.

  1. Is this correct?
  2. Also, does the concept of peer groups in rank() and dense_rank() offer some intuition as to why these functions ignore window frames?

Solution

  • "Ranking" window functions ignore the frame within the window, because the frame itself is defined by the rank resulting from the window's order by.

    You skipped row_number(), which also belongs with other ranking functions, ignoring, or applying outside/above/before the frame.

    Be sure to take a look at Chapter 4.2.8. SQL Syntax. Value Expressions: Window Function Calls.

    Here are some examples to illustrate: demo at db<>fiddle

    select*,sum(x)over w1,sum(x)over w2,sum(x)over w3,sum(x)over w4
    from t
    window w1 as(order by y)
         , w2 as(order by y rows between unbounded preceding
                                     and current row)
         , w3 as(order by y rows between unbounded preceding
                                     and unbounded following)
         , w4 as(order by y rows between current row
                                     and unbounded following);
    
    y x sum sum sum sum
    1 2 2 2 30 30
    2 4 6 6 30 28
    3 6 12 12 30 24
    4 8 29 20 30 18
    4 9 29 29 30 10
    5 0 29 29 30 1
    6 1 30 30 30 1
    select*,first_value(x)over w1,first_value(x)over w2,first_value(x)over w3,first_value(x)over w4
           ,nth_value(x,2)over w1,nth_value(x,2)over w2,nth_value(x,2)over w3,nth_value(x,2)over w4
           ,last_value(x) over w1,last_value(x) over w2,last_value(x) over w3,last_value(x) over w4
    from t
    window w1 as(order by y)
         , w2 as(order by y rows between unbounded preceding
                                     and current row)
         , w3 as(order by y rows between unbounded preceding
                                     and unbounded following)
         , w4 as(order by y rows between current row
                                     and unbounded following);
    
    y x first first first first nth nth nth nth last last last last
    1 2 2 2 2 2 null null 4 4 2 2 8 8
    2 4 2 2 2 4 4 4 4 4 4 4 8 8
    3 4 2 2 2 4 4 4 4 0 4 4 8 8
    4 0 2 2 2 0 4 4 4 6 0 0 8 8
    5 6 2 2 2 6 4 4 4 8 6 6 8 8
    6 8 2 2 2 8 4 4 4 null 8 8 8 8

    No difference here:

    select*,row_number()over w1,row_number()over w2,row_number()over w3,row_number()over w4
           ,dense_rank()over w1,dense_rank()over w2,dense_rank()over w3,dense_rank()over w4
           ,rank()      over w1,rank()      over w2,rank()      over w3,rank()      over w4
    from t
    window w1 as(order by y)
         , w2 as(order by y rows between unbounded preceding
                                     and current row
                                 exclude current row)
         , w3 as(order by y rows between unbounded preceding
                                     and unbounded following
                                 exclude current row)
         , w4 as(order by y rows between current row
                                     and unbounded following
                                 exclude current row);
    
    y x number number number number dense dense dense dense rank rank rank rank
    1 2 1 1 1 1 1 1 1 1 1 1 1 1
    2 4 2 2 2 2 2 2 2 2 2 2 2 2
    3 4 3 3 3 3 3 3 3 3 3 3 3 3
    4 0 4 4 4 4 4 4 4 4 4 4 4 4
    5 6 5 5 5 5 5 5 5 5 5 5 5 5
    6 8 6 6 6 6 6 6 6 6 6 6 6 6
    select*,lag(x,2)over w1,lag(x,2)over w2,lag(x,2)over w3,lag(x,2)over w4
           ,lead(x,2)over w1,lead(x,2)over w2,lead(x,2)over w3,lead(x,2)over w4
    from t
    window w1 as(order by y rows between 1 preceding
                                     and current row)
         , w2 as(order by y rows between unbounded preceding
                                     and current row
                                 exclude current row)
         , w3 as(order by y rows between unbounded preceding
                                     and current row
                                 exclude ties)
         , w4 as(order by y rows between 1 preceding
                                     and current row
                                 exclude ties)
    
    y x lag lag lag lag lead lead lead lead
    1 2 null null null null 6 6 6 6
    2 4 null null null null 8 8 8 8
    3 6 2 2 2 2 9 9 9 9
    4 8 4 4 4 4 0 0 0 0
    4 9 6 6 6 6 1 1 1 1
    5 0 8 8 8 8 null null null null
    6 1 9 9 9 9 null null null null
    select*,cume_dist()over w1,cume_dist()over w2,cume_dist()over w3,cume_dist()over w4
    from t
    window w1 as(order by y)
         , w2 as(order by y rows between unbounded preceding
                                     and current row
                                 exclude current row)
         , w3 as(order by y rows between unbounded preceding
                                     and unbounded following
                                 exclude current row)
         , w4 as(order by y rows between current row
                                     and unbounded following
                                 exclude current row);
    
    y x cume_dist cume_dist cume_dist cume_dist
    1 2 0.14285714285714285 0.14285714285714285 0.14285714285714285 0.14285714285714285
    2 4 0.2857142857142857 0.2857142857142857 0.2857142857142857 0.2857142857142857
    3 6 0.42857142857142855 0.42857142857142855 0.42857142857142855 0.42857142857142855
    4 8 0.7142857142857143 0.7142857142857143 0.7142857142857143 0.7142857142857143
    4 9 0.7142857142857143 0.7142857142857143 0.7142857142857143 0.7142857142857143
    5 0 0.8571428571428571 0.8571428571428571 0.8571428571428571 0.8571428571428571
    6 1 1 1 1 1
    select*,ntile(3)over w1,ntile(3)over w2,ntile(3)over w3,ntile(3)over w4
           ,ntile(5)over w1,ntile(5)over w2,ntile(5)over w3,ntile(5)over w4
           ,ntile(6)over w1,ntile(6)over w2,ntile(6)over w3,ntile(6)over w4
    from t
    window w1 as(order by y)
         , w2 as(order by y rows between unbounded preceding
                                     and current row
                                 exclude current row)
         , w3 as(order by y rows between unbounded preceding
                                     and unbounded following
                                 exclude current row)
         , w4 as(order by y rows between current row
                                     and unbounded following
                                 exclude current row);
    
    y x ntile ntile ntile ntile ntile ntile ntile ntile ntile ntile ntile ntile
    1 2 1 1 1 1 1 1 1 1 1 1 1 1
    2 4 1 1 1 1 1 1 1 1 1 1 1 1
    3 6 1 1 1 1 2 2 2 2 2 2 2 2
    4 8 2 2 2 2 2 2 2 2 3 3 3 3
    4 9 2 2 2 2 3 3 3 3 4 4 4 4
    5 0 3 3 3 3 4 4 4 4 5 5 5 5
    6 1 3 3 3 3 5 5 5 5 6 6 6 6