rggplot2linear-regressionggpubr

Plot the best fit linear regression with the slope set to a fixed value (m=1)


Currently using R 4.4.3 on Windows 11. I'm plotting the following data set with ggplot2 and performing a linear regression with geom_smooth:

df <- data.frame(A= c(1.313, 1.3118, 1.3132, 1.3122, 1.3128, 1.3061, 1.3051, 1.3052, 1.3069, 1.3053, 1.3072, 1.3006, 1.3246, 
1.3229, 1.3254, 1.3239, 1.3222, 1.3155, 1.313, 1.3147, 1.3174, 1.3174, 1.3188, 1.3134),
B=c(1.3165, 1.316, 1.3176, 1.316, 1.3169, 1.3104, 1.3094, 1.3095, 1.3107, 1.3101, 1.3112, 1.3047, 
1.3285, 1.3271, 1.3297, 1.3274, 1.3261, 1.3192, 1.318, 1.319, 1.322, 1.3215, 1.3232, 1.3172))

I'm using stat_regline_equation from the ggpubr package to print out the regression, and I'm also printing out a reference line of y=x using geom_abline. The code for my plot is shown below:

ggplot(df,  aes(x=A,  y=B)) +
    geom_point() +
    geom_abline(slope=1,  intercept=0) +
    geom_smooth(color='red',  method = lm,  se = FALSE,  formula = y~x) +
    stat_regline_equation(label.x=1.31,  label.y=1.325) +
    coord_equal()

The coord_equal command forces the scaling of the two axes to be equal.

Typical Plot with regression and y=x line

This is all well and good, but what I am needing to do is find the regression equation from this set of data if the slope of the line is fixed at 1 i.e. I'm hoping for an equation that looks like y=b + 1x where b is the intercept if the regression is forced to have a slope m=1.

This is a small example; my full data-set will likely have multiple similar plots generated wtih facet_wrap or facet_grid, so I'm definitely hoping there's a way to do this such that this forced-slope regression can be repeated across multiple panels.

From a practical standpoint, the data represent measurements of several items measured on Tool A and Tool B. So this is a correlation plot between the two tools. Ideally, they would measure identically, but it's clear that Tool B measures slightly higher than Tool A. The regression equation gives me that correlation, but I'd prefer to assume a linear offset (which is reasonable over a small range). Hence my desire to fix the regression at m=1 and calculate the intercept, which will effectively be the mean offset.

I'm not locked into geom_smooth or stat_regline_equation, so if there are better regression tools available compatible with ggplot2 that can help, I'm all ears.

Thanks!


Solution

  • library(ggplot2)
    library(dplyr)
    
    set.seed(3)
    ## adding a type column to showcase facets
    df %>% 
      mutate(type = rnorm(n()) < 0,
             A = A + abs(rnorm(n())),
             B = B + abs(rnorm(n())))  -> df_type
    
    ## lm(y ~ 1, offset = x) to get slope = 1
    df_type %>% 
      summarize(intercept = lm(B ~ 1, offset = A)$coefficients,
                .by = type) %>% 
      mutate(sign = ifelse(sign(intercept) == 1, "+", "-")) -> df_lm
    
    
    
    ggplot(df_type, aes(x=A,  y=B)) +
      geom_point() +
      geom_abline(slope=1,  intercept=0) +
      geom_smooth(color='red',  method = lm,  se = FALSE,  
                  formula = y ~ 1 + offset(x*1)) +
      geom_label(data = df_lm,
                 aes(x = -Inf, y = Inf, 
                     label = paste("y = x", sign, 
                                   abs(round(intercept, 5)))),
                 hjust = -0.25, vjust = 2.5) + 
      facet_wrap(~type) +
      coord_equal()
    
    

    If you were doing it without faceting:

    ggplot(df, aes(x = A, y = B)) +
      geom_point() +
      geom_abline(slope = 1,  intercept = 0) +
      geom_smooth(color='red',  method = lm,  se = FALSE,  
                  formula = y ~ 1 + offset(x*1)) +
      geom_label(aes(x = 1.315, y = 1.325, 
                     label = paste("y = x", sign, 
                                   abs(round(intercept, 4)))),
                 inherit.aes = FALSE, 
                 data = summarize(df, 
                                intercept = lm(B ~ 1, offset = A)$coefficients,
                                sign = ifelse(sign(intercept) == 1, "+", "-"))) + 
      coord_equal()