I would like to add a plot of the normal distribution with mean and variance of the residuals in a model in the same plot as the histogram of the residuals.
I am using the code below from chapter 5 of fpp3:
aug <- google_2015 |>
model(NAIVE(Close)) |>
augment()
autoplot(aug, .innov) +
labs(y = "$US",
title = "Residuals from the naïve method")
aug |>
ggplot(aes(x = .innov)) +
geom_histogram() +
labs(title = "Histogram of residuals")
It would be best to tell us that fpp3 is "Forecasting: Principles and Practice" (3rd Edition), that there is a corresponding R package, and to show us the code for creating the google_2015
object so we didn't have to go dig it out, but here you go ...
library(fpp3)
google_2015 <- gafa_stock |>
filter(Symbol == "GOOG", year(Date) >= 2015) |>
mutate(day = row_number()) |>
update_tsibble(index = day, regular = TRUE) |>
filter(year(Date) == 2015)
aug <- google_2015 |>
model(NAIVE(Close)) |>
augment()
We need to (1) plot the histogram on a density rather than a count scale and (2) compute the required mean and SD on the fly (alternatively we could plot the histogram on the count scale and multiply the Normal density by the number of observations, but that seems slightly harder)
The only improvement of this on the linked duplicate is that the aes(y = ..density..)
idiom used in those answers is deprecated as of ggplot2 3.4.0 and will throw a warning ...
gg0 <- aug |>
ggplot(aes(x = .innov)) +
geom_histogram(aes(y=after_stat(density)))
gg0 + geom_function(fun = dnorm, colour = "red", n = 1001,
args = list(mean = mean(aug$.innov, na.rm = TRUE),
sd = sd(aug$.innov, na.rm = TRUE)))