rggplot2lmggpmisc

How to force `stat_poly_line()` to use a specific non-zero y-intercept?


Using stat_poly_line() from package 'ggpmisc', one can fit a polynomial to data by default using lm() as method. You can force the fit through zero with either: formula = y ~ x + 0 or formula = y ~ x - 1. I cannot force it through a specific non-zero y-intercept for my linear model. In this case, I need to force it through 5.05.

Note: I recognize linear models are rarely statistically useful when the y-intercept is forced, but in my case I believe it is fine.

Here is my data:

mydata <- structure(list(y = c(20.2, 29.74, 22.37, 24.51, 
37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02), x = c(0.422014140000002, 
1.09152966, 1.3195521, 3.54231348, 2.79431778, 3.40756002, 5.58845772, 
7.10762298, 9.70041246, 11.7199653, 15.89668266)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

And here is a simplified version of my plot:

myplot <- ggplot(mydata, aes(x = x, y = y)) +
  stat_poly_line(se = FALSE, 
                 linetype = "dashed", 
                 na.rm = TRUE, 
                 formula = y ~ x + 0) +
  stat_poly_eq(use_label(c("eq", "R2", "adj.R2")), 
               na.rm = TRUE, 
               formula = y ~ x + 0) +
  geom_point(size = 2.5) 

The x-value of the variable is 0 but I tried using 5.05 in that place to represent a y-intercept at 5.05 for the linear model (the x + 0 comes from the packages guide for how to put parabola intercepts at 0). This approach does not work, nor does using it on the y side of the formula either.

I could use another package relatively quickly, but I feel like there is a simple solution I can implement here.

Any help?


Solution

  • Interesting question! And you are correct, in that there is a solution within 'ggpmisc'. However, it may take a bit of familiarization before its feels simple...

    stat_poly_line() by default uses lm() as method. So, as with lm() the straightforward way of doing what you want is subtracting 5.05 from all the y-values, fitting with formula = y ~ x + 0. The slope from the fit will be the one you want, and the intercept 5.05. So, you can use as formula = I(y - 5.05) ~ x + 0. To get the correct line plotted, the subtracted value needs to be added back to the predicted values, which are returned in y by the statistic. With the equation, some plotmath trickery needs to be used to edit the equation label returned by the statistic.

    For the example below, I used 20 instead of 5.05 as it felt more reasonable for the example data you provided. (As a side note: se=TRUE could be used and valid, but would require adding the intercept, 20 in my exampele, to ymax and ymin in addition to to y.)

    library(ggpmisc)
    #> Loading required package: ggpp
    #> Loading required package: ggplot2
    
    mydata <- structure(list(Y = c(20.2, 29.74, 22.37, 24.51, 
                                   37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02), 
                             X = c(0.422014140000002, 
                                   1.09152966, 1.3195521, 3.54231348, 2.79431778, 
                                   3.40756002, 5.58845772, 
                                   7.10762298, 9.70041246, 11.7199653, 15.89668266)), 
                        row.names = c(NA, -11L), 
                        class = c("tbl_df", "tbl", "data.frame"))
    
    myplot <- ggplot(mydata) +
      stat_poly_line(se = FALSE, 
                     linetype = "dashed",
                     na.rm = TRUE, 
                     mapping = aes(x = X, y = stage(start = Y, after_stat = y + 20)),
                     formula = I(y - 20) ~ x + 0) +
      stat_poly_eq(mapping = aes(X, Y, 
                        label = after_stat(paste(eq.label, "~+~20*\", \"*", rr.label))), 
                   na.rm = TRUE, 
                   orientation = "x", 
                   formula = I(y - 20) ~ x + 0) +
      geom_point(mapping = aes(X, Y), size = 2.5) +
      ylab("y") +
      expand_limits(y = 0)
    
    myplot
    

    Created on 2023-10-13 with reprex v2.0.2