I have to use running total value for the parameter "users". Here is the script (simplified):
with us_data as (
select
count(us.*) as us_reg,
count(us.*) filter (where status = 'active') as us_active,
date_trunc('day', us.created_at) as rp_date
from public.account_users us
group by rp_date)
select
sum(us_reg) over (rows between unbounded preceding and current row)::int as total_users,
rp_date
from us_data
order by rp_date;
Since the table is going to grow each day, the script is running slowly as it must calculate all users from the beginning. I can store the data for the day before (or any other previous day) in different table and change the script to get total_users from that day till now. Then I can add saved results and store data again in result table.
Is there any more optimal solution for the task to speed up the calculation of running total for users accounts?
You can just run a window function over an aggregate function in the same SELECT
.
us_active
in the CTE was dead freight that I cut out.
And rows between unbounded preceding and current row
is the default, so you don't have to spell it out.
But you did omit ORDER BY
in the window function. I added it:
SELECT sum(count(*)::int) OVER (ORDER BY date_trunc('day', us.created_at)) AS total_users
, date_trunc('day', us.created_at) AS rp_date
FROM public.account_users us
GROUP BY rp_date
ORDER BY rp_date; -- !
Storing the running sum in some kind of materialized view, and only adding the last increment for "yesterday" will make it a lot faster, obviously, for a long series. It adds quite a bit of overhead, so the series has to go on for a while before the added cost pays.
Related:
For huge amounts of time series data, consider a specialized fork like Timescale DB.