I am trying to do a time series analysis project to learn more and test my own skills. For this time series analysis, I want to show moving averages and percent differences over the periods, daily, monthly, and annually.
I am doing this by making a temporary table containing a LAG function to get values to do the arithmetic for the moving averages and percent differences. I believe the issue of efficiency is from referencing the table too many times in the select query, and that's causing it to take forever to compile. Along with a large case function that is repeated three times. I am not the most familiar with SQL, but is there any way I can use recursion or make a function to call upon for a repeated expression? Any help to make this more efficient is greatly appreciated! I am also using MySQL for this project.
Code attached below.
WITH debt_lag AS(
SELECT *, LAG(`Debt Held by the Public`, 7) OVER (ORDER BY `Record Date`) AS week_b4_dhpb,
LAG(`Debt Held by the Public`, 7) OVER (ORDER BY `Record Date`) AS month_b4_dhpb,
LAG(`Debt Held by the Public`, 365) OVER (ORDER BY `Record Date`) AS year_b4_dhpb,
LAG(`Intragovernmental Holdings`, 7) OVER (ORDER BY `Record Date`) AS week_b4_ig,
LAG(`Intragovernmental Holdings`, 7) OVER (ORDER BY `Record Date`) AS month_b4_ig,
LAG(`Intragovernmental Holdings`, 365) OVER (ORDER BY `Record Date`) AS year_b4_ig,
LAG(`Total Public Debt Outstanding`, 7) OVER (ORDER BY `Record Date`) AS week_b4_tdpo,
LAG(`Total Public Debt Outstanding`, 7) OVER (ORDER BY `Record Date`) AS month_b4_tdpo,
LAG(`Total Public Debt Outstanding`, 365) OVER (ORDER BY `Record Date`) AS year_b4_tdpo
FROM DebtPenny_19930401_20250623
)
SELECT `Record Date`, (((`Debt Held by the Public` - week_b4_dhpb) / week_b4_dhpb )* 100) AS DHPB_percent_change_weekly,
ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),3) AS DHPB_WEEKLY_MOVING_AVERAGE,
(((`Debt Held by the Public` - month_b4_dhpb) / month_b4_dhpb )* 100) AS DHPB_percent_change_monthly,
CASE
WHEN SUBSTRING(`Record Date`,6,2) = '01' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '02' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '03' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '04' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '05' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '06' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '07' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '08' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '09' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '10' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '11' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '12' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
END AS DHPB_MONTHLY_MOVING_AVERAGE,
(((`Debt Held by the Public` - year_b4_dhpb) / year_b4_dhpb )* 100) AS DHPB_percent_change_annual,
ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 364 PRECEDING AND CURRENT ROW),3) AS DHPB_ANNUAL_MOVING_AVERAGE,
(((`Intragovernmental Holdings` - week_b4_ig) / week_b4_ig )* 100) AS IG_percent_change_weekly,
ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),3) AS IG_WEEKLY_MOVING_AVERAGE,
(((`Intragovernmental Holdings` - month_b4_ig) / month_b4_ig)* 100) AS IG_percent_change_monthly,
CASE
WHEN SUBSTRING(`Record Date`,6,2) = '01' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '02' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '03' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '04' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '05' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '06' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '07' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '08' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '09' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '10' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '11' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '12' THEN ROUND(AVG(`Intragovernmental Holdings`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
END AS IG_MONTHLY_MOVING_AVERAGE,
(((`Debt Held by the Public` - year_b4_ig) / year_b4_ig )* 100) AS IG_percent_change_annual,
ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 364 PRECEDING AND CURRENT ROW),3) AS IG_ANNUAL_MOVING_AVERAGE,
(((`Total Public Debt Outstanding` - week_b4_tdpo) / week_b4_tdpo )* 100) AS TDPO_percent_change_weekly,
ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 6 PRECEDING AND CURRENT ROW),3) AS TDPO_WEEKLY_MOVING_AVERAGE,
(((`Total Public Debt Outstanding` - month_b4_ig) / month_b4_ig)* 100) AS TDPO_percent_change_monthly,
CASE
WHEN SUBSTRING(`Record Date`,6,2) = '01' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '02' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '03' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '04' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '05' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '06' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '07' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '08' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '09' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '10' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '11' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN SUBSTRING(`Record Date`,6,2) = '12' THEN ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
END AS TDPO_MONTHLY_MOVING_AVERAGE,
(((`Total Public Debt Outstanding` - year_b4_tdpo) / year_b4_tdpo )* 100) AS TDPO_percent_change_annual,
ROUND(AVG(`Total Public Debt Outstanding`) OVER (ORDER BY `Record Date` ROWS BETWEEN 364 PRECEDING AND CURRENT ROW),3) AS TDPO_ANNUAL_MOVING_AVERAGE
FROM debt_lag;
Since all your WHEN
clauses are just performing an equality test of the same expression, you should change to the format:
CASE <expression>
WHEN <value1> THEN <result1>
WHEN <value2> THEN <result2>
...
END
So it should be:
CASE SUBSTRING(`Record Date`,6,2)
WHEN '01' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '02' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),3)
WHEN '03' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '04' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN '05' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '06' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '07' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '08' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '09' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN '10' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
WHEN '11' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 29 PRECEDING AND CURRENT ROW),3)
WHEN '12' THEN ROUND(AVG(`Debt Held by the Public`) OVER (ORDER BY `Record Date` ROWS BETWEEN 30 PRECEDING AND CURRENT ROW),3)
END AS DHPB_MONTHLY_MOVING_AVERAGE,
This might not make any performance improvement if MySQL automatically performs common subexpression elimination, but it makes the code more readable.
Another option would be to have a table that maps month numbers to the number of days to average. You can then join SUBSTRING(
Record Date, 6, 2)
with that, and use the corresponding value in the window function.
If the datatype of Record Date
is DATE
or DATETIME
, you should use the MONTH()
function rather than SUBSTRING()
. The latter requires the more complex operation of formatting the date and then extracting the substring, rather than just calculating the month portion of the date (which will happen internally to the formatting function).