rggplot2splines

Errors Plotting a Restricted Cubic Spline with ggplot2


I would like to use ggplot2 to illustrate a fit using a restricted cubic spline using geom_smooth() but it seems to be working incorrectly. Here is a short example:

# rms package Contains Restricted Cubic Splines (RCS)
library(rms)
library(ggplot2)

# Load Data
data(cars)

# Model Fit with RCS
fit <- lm(speed ~ rcs(dist, 5), data=cars)

# Obtain Diagnostic Data
plot.dat <- cbind(cars, fitted=fitted(fit))

# Compare Smooth to Actual
ggplot(data=plot.dat) +
  geom_point(aes(x=dist, y=speed)) +
  geom_smooth(aes(x=dist, y=speed), method="lm", 
              formula=y ~ rcs(x, 5), se=FALSE, colour="blue") +
  geom_line(aes(y=fitted, x=dist), size=1.25, colour="red")

This results in the following image: Comparison of Splines I am not sure why geom_smooth() is not giving the correct results. Clearly there is a work-around (as illustrated above), but is there a way to make geom_smooth() produce the correct results?


Solution

  • It has been a long time, but I finally recognized the problem, and I thought I would post it here for those interested. Internally, geom_smooth() will create a sequence of the predictor at which to plot the predicted response. As this sequence is spaced out across the range of the x-axis, the knot points selected by rcs() inside of geom_smooth() will differ from the knot points selected by rcs() on the original data. To address this, you need to pass in the correct knot points.

    # rms package Contains Restricted Cubic Splines (RCS)
    library(rms)
    library(ggplot2)
    
    # Load Data
    data(cars)
    
    # Model Fit with RCS
    fit <- lm(speed ~ rcs(dist, 5), data=cars)
    
    # Obtain Diagnostic Data
    plot.dat <- cbind(cars, fitted=fitted(fit))
    
    # Compare Smooth to Actual
    ggplot(data=plot.dat) +
      geom_point(aes(x=dist, y=speed)) +
      geom_smooth(aes(x=dist, y=speed), method="lm", 
                  formula=y ~ rcs(x, quantile(plot.dat$dist, probs = c(0.05, 0.275, 0.5, 0.725, 0.95))), se=FALSE, colour="blue") +
      geom_line(aes(y=fitted, x=dist), colour="red")