rlinear-regressionscatter-plotstandard-error

Plot standard error in base r scatterplot


I am well aware of how to plot the standard error of a regression using ggplot. As an example with the iris dataset, this can be easily done with this code:

library(tidyverse)

iris %>% 
  ggplot(aes(x=Sepal.Width,
             y=Sepal.Length))+
  geom_point()+
  geom_smooth(method = "lm",
              se=T)

enter image description here

I also know that a regression using base R scatterplots can be achieved with this code:

#### Scatterplot ####
plot(iris$Sepal.Width,
     iris$Sepal.Length)

#### Fit Regression ####
fit <- lm(iris$Sepal.Length ~ iris$Sepal.Width)

#### Fit Line to Plot ####
abline(fit, col="red")

enter image description here

However, I've tried looking up how to plot standard error in base R scatterplots, but all I have found both on SO and Google is how to do this with error bars. However, I would like to shade the error in a similar way as ggplot does above. How can one accomplish this?

Edit

To manually obtain the standard error of this regression, I believe you would calculate it like so:

#### Derive Standard Error ####
fit <- lm(Sepal.Length ~ Sepal.Width,
          iris)
n <- length(iris)
df <- n-2 # degrees of freedom
y.hat <- fitted(fit)
res <- resid(fit)
sq.res <- res^2
ss.res <- sum(sq.res)
se <- sqrt(ss.res/df)

So if this may allow one to fit it into a base R plot, I'm all ears.


Solution

  • Here's a slightly fiddly approach using broom::augment to generate a dataset with predictions and standard errors. You could also do it in base R with predict if you don't want to use broom but that's a couple of extra lines.

    Note: I was puzzled as to why the interval in my graph are narrower than your ggplot interval in the question. But a look at the geom_smooth documentation suggests that the se=TRUE option adds a 95% confidence interval rather than +-1 se as you might expect. So its probably better to generate your own intervals rather than letting the graphics package do it!

    #### Fit Regression (note use of `data` argument) ####
    fit <- lm(data=iris, Sepal.Length ~ Sepal.Width)
    
    #### Generate predictions and se ####
    dat <- broom::augment(fit, se_fit = TRUE)
    
    #### Alternative using `predict` instead of broom ####
    dat <- cbind(iris, 
                 .fitted=predict(fit, newdata = iris), 
                 .se.fit=predict(fit, newdata = iris, se.fit = TRUE)$se.fit)
    
    #### Now sort the dataset in the x-axis order
    dat <- dat[order(dat$`Sepal.Width`),]
    
    #### Plot with predictions and standard errors
    with(dat, {
      plot(Sepal.Width,Sepal.Length)
      polygon(c(Sepal.Width, rev(Sepal.Width)), c(.fitted+.se.fit, rev(.fitted-.se.fit)), border = NA, col = hsv(1,1,1,0.2))
      lines(Sepal.Width,.fitted, lwd=2)
      lines(Sepal.Width,.fitted+.se.fit, col="red")
      lines(Sepal.Width,.fitted-.se.fit, col="red")
      })
    

    enter image description here