rggplot2plotsmoothing

why is my geom_smooth not flat as it should be?


I added a geom_smooth on my plot, but sometimes it should be flat (all the points are on the same line) an it does a peak at the beginning.

I don't want to increase the span because the curve suits me for other values.

This is my example :

test<-structure(list(Nom = c("255", "255", "255", "255", "255"), 
Sensibilite = c(2,2, 2, 2, 2),  Année_nota = c("2023", "2023","2023", "2023", "2023"), month_day = structure(c(19118, 19145,19145, 19157, 19277), class = "Date"), month_day_num = c(19118, 
    19145, 19145, 19157, 19277)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

ggplot(test, aes(month_day, Sensibilite, label = Nom, color = Nom)) +
    geom_smooth(se = FALSE, size=0.5) +
    geom_point(size =0.5)+ 
    geom_label_repel(aes(label = after_stat(helper_label(x, color, PANEL, "max"))),
                     stat = "smooth",segment.size = 0.1,
                     max.overlaps = Inf, size = 2, na.rm = TRUE, direction = "y", nudge_x = 30
    ) +
    theme_light()+
    theme(legend.position = "none")+
    scale_y_continuous(limits = c(0,6))+
    scale_x_date(expand = c(0.15, 0),name = "Mois",date_breaks = "1 month",date_labels = "%b")

and the result : enter image description here

thank you for your help !


Solution

  • You'll notice a number of warnings after fitting the smooth - that the degrees of freedom (2 by default) is larger than the number of points.

    library(ggplot2)
    test<-structure(list(Nom = c("255", "255", "255", "255", "255"), 
                         Sensibilite = c(2,2, 2, 2, 2),  Année_nota = c("2023", "2023","2023", "2023", "2023"), month_day = structure(c(19118, 19145,19145, 19157, 19277), class = "Date"), month_day_num = c(19118, 
                                                                                                                                                                                                              19145, 19145, 19157, 19277)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
    ggplot(test, aes(month_day, Sensibilite, label = Nom, color = Nom)) +
      geom_smooth(se = FALSE, linewidth=0.5) +
      geom_point(size =0.5)+ 
      theme_light()+
      theme(legend.position = "none")+
      scale_y_continuous(limits = c(0,6))+
      scale_x_date(expand = c(0.15, 0),name = "Mois",date_breaks = "1 month",date_labels = "%b")
    #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
    #> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
    #> : span too small.  fewer data values than degrees of freedom.
    #> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
    #> : pseudoinverse used at 19117
    #> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
    #> : neighborhood radius 27.795
    #> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
    #> : reciprocal condition number 0
    #> Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
    #> : There are other near singularities as well. 17635
    

    You could solve this by using a 0 degree smoother (just the local mean) which solves the problem in this case and stops the warnings because there are sufficient points in each local window to estimate the mean.

    
    ggplot(test, aes(month_day, Sensibilite, label = Nom, color = Nom)) +
      stat_smooth(se = FALSE, linewidth=0.5, method.args=list(degree=0), geom="line") +
      geom_point(size =0.5)+ 
      theme_light()+
      theme(legend.position = "none")+
      scale_y_continuous(limits = c(0,6))+
      scale_x_date(expand = c(0.15, 0),name = "Mois",date_breaks = "1 month",date_labels = "%b")
    #> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
    

    Created on 2024-06-05 with reprex v2.1.0