sqlpostgresql

Cumulative sum with group by


I have a table with financial transactions, and want to create a running total per day. Is this the correct way to do it? Is there a better way? It seems to work, but I could not find this specific combination (SUM ... OVER with GROUP BY) on stackoverflow, and the nested SUM asks for an extra pair of eyes:

  SELECT date,
    SUM(SUM(amount)) over (PARTITION BY TRUE ORDER BY date) AS value
    FROM transactions
    GROUP BY date
    ORDER BY date ASC

Example data:

date                      amount

2024-11-01 00:00:00+00    -  1.9
2024-11-01 00:00:00+00    -  1.9
2024-11-02 00:00:00+00    -  1.9
2024-11-02 00:00:00+00      10
2024-11-05 00:00:00+00       3.9
2024-11-08 00:00:00+00    -  1.95
2024-11-13 00:00:00+00    - 10
2024-11-16 00:00:00+00      80
2024-11-18 00:00:00+00      85
2024-11-23 00:00:00+00     498.25
2024-11-27 00:00:00+00    -498.25
2024-11-30 00:00:00+00    -650

Expected result:

date                      amount

2024-11-01 00:00:00+00    -  3.8
2024-11-02 00:00:00+00       4.3
2024-11-05 00:00:00+00       8.2
2024-11-08 00:00:00+00       6.25
2024-11-13 00:00:00+00    -  3.75
2024-11-16 00:00:00+00      76.25
2024-11-18 00:00:00+00     161.25
2024-11-23 00:00:00+00     659.5
2024-11-27 00:00:00+00     161.25
2024-11-30 00:00:00+00    -488.75

Solution

  • Well, I think it's difficult to answer what is "correct" or "better" in this case. Obviously, your query is correct and produces the expected result. Except for the point that you will likely better cast the timestamp to a whole date to summarize data of entire days if the time differs.

    A simpler way to get the intended result would be DISTINCT ON:

    SELECT 
      DISTINCT ON (trans_date::DATE)
      trans_date::DATE AS trans_date,
      SUM(amount) OVER(ORDER BY trans_date)
    FROM transactions;
    

    But this seems to be slower than your current idea:

    SELECT 
      trans_date::DATE,
      SUM(SUM(amount)) OVER (ORDER BY trans_date::DATE) AS value
    FROM transactions
    GROUP BY trans_date::DATE
    ORDER BY trans_date::DATE;
    

    I tried both queries on this db<>fiddle with some random rows. According to the query plan, the DISTINCT ON requires more time.

    So in my opinion, your query is fine as it is.