In below geom_smooth
, the line year 2023
is smoother than year 2024
, but the 2023 amount SD is 20 lager then 2024 15. How to fix it?
library(tidyverse)
df_2023 <- data.frame(mdate =seq.Date(from=as.Date('2023-1-1'),
to=as.Date('2023-12-31'),by="1 day"),
amount = rnorm(365,mean=4,sd=20),
myear='2023')
df_2024 <- data.frame(mdate = seq.Date(from=as.Date('2024-1-1'),
to=as.Date('2024-6-28'),by="1 day"),
amount= rnorm(180,mean=4,sd=15),
myear='2024')
plot_data <- rbind(df_2023,df_2024)
plot_data %>% mutate(mdate_new = update(mdate,year=2024)) %>%
ggplot(aes(x = mdate_new,y=amount,color=myear )) + geom_line(aes(alpha=0.6))+
geom_smooth(se=FALSE)
Maybe a smooth line of 2023 generated by whole year data, so more smooth.
I changed the above geom_smoooth
to the below code, but failed
geom_smooth(aes(data= plot_data %>% filter(mdate_new <as.Date('2024-6-28'))))
stat_smooth
tells you that loess
is used and it is used with the default parameters, specifically with span = 0.75
. The documentation explains that for smoothing a neighbourhood including a proportion of data points is used and that proportion is defined by span
, i.e., by default a neighbourhood with 75 % of the points is used.
Now, in the data subsets you have very different total numbers of data points, which means the default neighbourhood has very different numbers of points, which results in different smoothing. You can correct that:
n24 <- nrow(subset(plot_data, myear == 2024))
n23 <- nrow(subset(plot_data, myear == 2023))
ggplot(plot_data, aes(x = mdate_new,y=amount,color=myear )) + geom_line(aes(alpha=0.6))+
geom_smooth(data = subset(plot_data, myear == 2023), se=FALSE, span = 0.75 * n24/n23) +
geom_smooth(data = subset(plot_data, myear == 2024), se=FALSE, span = 0.75)
I don't show the output here because you didn't set a random seed and thus your data isn't fully reproducible.
PS: It might be preferable to fit a mgcv::gam
model outside ggplot2. That gives you much more fine control.