sqlpostgresqlwindow-functionscorrelated-subquery

Window function in scalar subquery does not work as expect


The following query can be used to calculate cumulative sums:

select 
  (select sum(c) over (ROWS UNBOUNDED PRECEDING) as a),
  sum(c) over (ROWS UNBOUNDED PRECEDING ) as b 
from (select unnest(ARRAY[1, 2, 3, 3, 4, 5]) as c) x

Column b gives the result as expected, while a does not.

a b
1 1
2 3
3 6
3 9
4 13
5 18

I've searched on Google and Stackoverflow but failed to find a similar problem.

Here are my questions:


Solution

  • Why does the simple select wrapper lead to a completely different result?

    Your output column a is the result of a correlated subquery, which is run once for every row produced by the underlying derived table x. Each of these computations sees only the single row it is computed for. So the sum is always the same as the single value.

    Your output column b, on the other hand, is based on the complete derived table x. So you see actual sums of multiple rows.

    Is it possible to keep the subquery in the SELECT list and still make the window function work as expect?

    No. Unless you join back to additional rows from the underlying relation x within the correlated subquery. You would have to generate x in a CTE to reuse it (or use an actual table), and that would be an odd, expensive construct, and most likely nonsense. (What would be the use case?)

    While you don't add an explicit ORDER BY in the window frame, b is not completely reliable, either. Well, it won't break for the simple query. But it might if you do more (like add joins etc.). See:

    And unnest() can stand as "table-function" on its own. No additional wrapper needed. Consider:

    SELECT sum(c) OVER (ORDER BY ord ROWS UNBOUNDED PRECEDING) AS b
    FROM   unnest(ARRAY[1, 2, 3, 3, 4, 5]) WITH ORDINALITY x(c, ord);
    

    Note that this sorts by original position in the array. Your example is ambiguous in this respect as the array has strictly ascending values which happen to sort the same when ordered by the value. I assume you care about the position.

    Aside: The default framing option for window functions is RANGE UNBOUNDED PRECEDING, so it matters (logically) to change that to ROWS UNBOUNDED PRECEDING while the sort order has peers - values sorting equally. And with no ORDER BY in the window frame, all rows are peers.

    It does not matter (logically) in my query because that sorts by the "ordinality" - with deterministic sort order, no peers.

    But it still matters for performance. ROWS is faster as it does not have to check for peers. Postgres 16 includes substantial performance improvements in this area. But sum() is unaffected since RANGE and ROWS can make an actual difference. It still pays to switch to ROWS when you know that the sort order is unambiguous, like in my query. See: