Using stat_poly_line()
from package 'ggpmisc', one can fit a polynomial to data by default using lm()
as method. You can force the fit through zero with either: formula = y ~ x + 0
or formula = y ~ x - 1
. I cannot force it through a specific non-zero y-intercept for my linear model. In this case, I need to force it through 5.05.
Note: I recognize linear models are rarely statistically useful when the y-intercept is forced, but in my case I believe it is fine.
Here is my data:
mydata <- structure(list(y = c(20.2, 29.74, 22.37, 24.51,
37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02), x = c(0.422014140000002,
1.09152966, 1.3195521, 3.54231348, 2.79431778, 3.40756002, 5.58845772,
7.10762298, 9.70041246, 11.7199653, 15.89668266)), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
And here is a simplified version of my plot:
myplot <- ggplot(mydata, aes(x = x, y = y)) +
stat_poly_line(se = FALSE,
linetype = "dashed",
na.rm = TRUE,
formula = y ~ x + 0) +
stat_poly_eq(use_label(c("eq", "R2", "adj.R2")),
na.rm = TRUE,
formula = y ~ x + 0) +
geom_point(size = 2.5)
The x-value of the variable is 0 but I tried using 5.05 in that place to represent a y-intercept at 5.05 for the linear model (the x + 0 comes from the packages guide for how to put parabola intercepts at 0). This approach does not work, nor does using it on the y side of the formula either.
I could use another package relatively quickly, but I feel like there is a simple solution I can implement here.
Any help?
Interesting question! And you are correct, in that there is a solution within 'ggpmisc'. However, it may take a bit of familiarization before its feels simple...
stat_poly_line()
by default uses lm()
as method. So, as with lm()
the straightforward way of doing what you want is subtracting 5.05 from all the y-values, fitting with formula = y ~ x + 0
. The slope from the fit will be the one you want, and the intercept 5.05. So, you can use as formula = I(y - 5.05) ~ x + 0
. To get the correct line plotted, the subtracted value needs to be added back to the predicted values, which are returned in y
by the statistic. With the equation, some plotmath
trickery needs to be used to edit the equation label returned by the statistic.
For the example below, I used 20 instead of 5.05 as it felt more reasonable for the example data you provided. (As a side note: se=TRUE
could be used and valid, but would require adding the intercept, 20 in my exampele, to ymax
and ymin
in addition to to y
.)
library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2
mydata <- structure(list(Y = c(20.2, 29.74, 22.37, 24.51,
37.2, 31.43, 43.05, 54.36, 65.44, 67.28, 46.02),
X = c(0.422014140000002,
1.09152966, 1.3195521, 3.54231348, 2.79431778,
3.40756002, 5.58845772,
7.10762298, 9.70041246, 11.7199653, 15.89668266)),
row.names = c(NA, -11L),
class = c("tbl_df", "tbl", "data.frame"))
myplot <- ggplot(mydata) +
stat_poly_line(se = FALSE,
linetype = "dashed",
na.rm = TRUE,
mapping = aes(x = X, y = stage(start = Y, after_stat = y + 20)),
formula = I(y - 20) ~ x + 0) +
stat_poly_eq(mapping = aes(X, Y,
label = after_stat(paste(eq.label, "~+~20*\", \"*", rr.label))),
na.rm = TRUE,
orientation = "x",
formula = I(y - 20) ~ x + 0) +
geom_point(mapping = aes(X, Y), size = 2.5) +
ylab("y") +
expand_limits(y = 0)
myplot
Created on 2023-10-13 with reprex v2.0.2