Currently using R 4.4.3 on Windows 11. I'm plotting the following data set with ggplot2
and performing a linear regression with geom_smooth
:
df <- data.frame(A= c(1.313, 1.3118, 1.3132, 1.3122, 1.3128, 1.3061, 1.3051, 1.3052, 1.3069, 1.3053, 1.3072, 1.3006, 1.3246,
1.3229, 1.3254, 1.3239, 1.3222, 1.3155, 1.313, 1.3147, 1.3174, 1.3174, 1.3188, 1.3134),
B=c(1.3165, 1.316, 1.3176, 1.316, 1.3169, 1.3104, 1.3094, 1.3095, 1.3107, 1.3101, 1.3112, 1.3047,
1.3285, 1.3271, 1.3297, 1.3274, 1.3261, 1.3192, 1.318, 1.319, 1.322, 1.3215, 1.3232, 1.3172))
I'm using stat_regline_equation
from the ggpubr
package to print out the regression, and I'm also printing out a reference line of y=x using geom_abline
. The code for my plot is shown below:
ggplot(df, aes(x=A, y=B)) +
geom_point() +
geom_abline(slope=1, intercept=0) +
geom_smooth(color='red', method = lm, se = FALSE, formula = y~x) +
stat_regline_equation(label.x=1.31, label.y=1.325) +
coord_equal()
The coord_equal
command forces the scaling of the two axes to be equal.
This is all well and good, but what I am needing to do is find the regression equation from this set of data if the slope of the line is fixed at 1 i.e. I'm hoping for an equation that looks like y=b + 1x
where b is the intercept if the regression is forced to have a slope m=1.
This is a small example; my full data-set will likely have multiple similar plots generated wtih facet_wrap
or facet_grid
, so I'm definitely hoping there's a way to do this such that this forced-slope regression can be repeated across multiple panels.
From a practical standpoint, the data represent measurements of several items measured on Tool A and Tool B. So this is a correlation plot between the two tools. Ideally, they would measure identically, but it's clear that Tool B measures slightly higher than Tool A. The regression equation gives me that correlation, but I'd prefer to assume a linear offset (which is reasonable over a small range). Hence my desire to fix the regression at m=1 and calculate the intercept, which will effectively be the mean offset.
I'm not locked into geom_smooth
or stat_regline_equation
, so if there are better regression tools available compatible with ggplot2
that can help, I'm all ears.
Thanks!
library(ggplot2)
library(dplyr)
set.seed(3)
## adding a type column to showcase facets
df %>%
mutate(type = rnorm(n()) < 0,
A = A + abs(rnorm(n())),
B = B + abs(rnorm(n()))) -> df_type
## lm(y ~ 1, offset = x) to get slope = 1
df_type %>%
summarize(intercept = lm(B ~ 1, offset = A)$coefficients,
.by = type) %>%
mutate(sign = ifelse(sign(intercept) == 1, "+", "-")) -> df_lm
ggplot(df_type, aes(x=A, y=B)) +
geom_point() +
geom_abline(slope=1, intercept=0) +
geom_smooth(color='red', method = lm, se = FALSE,
formula = y ~ 1 + offset(x*1)) +
geom_label(data = df_lm,
aes(x = -Inf, y = Inf,
label = paste("y = x", sign,
abs(round(intercept, 5)))),
hjust = -0.25, vjust = 2.5) +
facet_wrap(~type) +
coord_equal()
If you were doing it without faceting:
ggplot(df, aes(x = A, y = B)) +
geom_point() +
geom_abline(slope = 1, intercept = 0) +
geom_smooth(color='red', method = lm, se = FALSE,
formula = y ~ 1 + offset(x*1)) +
geom_label(aes(x = 1.315, y = 1.325,
label = paste("y = x", sign,
abs(round(intercept, 4)))),
inherit.aes = FALSE,
data = summarize(df,
intercept = lm(B ~ 1, offset = A)$coefficients,
sign = ifelse(sign(intercept) == 1, "+", "-"))) +
coord_equal()