rsplinecubic-spline

Getting Formulas for smooth.splines in R


I have used a stats::smooth.spline function to fit a dataset of 60 (x, y) pairs, and now I need to access the formula which can generate predictions for me, while I have access to the model. Unfortunately the documentation here doesn't help me figure this out (and also appears slightly out of date).

My understanding is that smooth.spline provides a single function g: Reals -> Reals where g is a cubic polynomial satisfying g = argmin(g) spar * SUM(MSE) + lambda * Integral [g''(x)]^2 dx. If this is the case, I'd love to see an output of g(x) = x^3 + 5x^2 + 10x + 15 (though the resulting object seems to only contain spar and lambda).

It's also possible that smooth.spline forms multiple splines along different parts of the data, in which case my desired output would be something similar to:

Here's some example code:

dataset <- data.frame(x = c(1,2,3,4,5,6,7), y = c(10,8,4,6,8,11,15))
spline_result <- smooth.spline(dataset$x, y = dataset$y)
# Plotting the spline looks like a parabola centered near x=3.5 y=5
# so I'd expect something like g(x) = k *(x - 3.5) ^ 2 + 5
# where k is some constant

Additional information: I'm running R version 4.1.2 2021-11-01 Bird Hippie with packageVersion("stats") 4.1.2

Thanks for the help!


Solution

  • The smooth.spline function produces a "natural spline" with knots at each x value. That means it is a linear function outside the range of x and is a cubic polynomial between each value. So in your case you'll have 2 linear equations and 6 cubic equations. This is usually not a useful way to work with such a function, because often the coefficients nearly cancel each other out, so there could be a lot of rounding error when you try to evaluate it.

    If you are just interested in evaluating the spline at various points that weren't in your original x vector, use the predict() function, e.g.

    dataset <- data.frame(x = c(1,2,3,4,5,6,7), y = c(10,8,4,6,8,11,15))
    spline_result <- smooth.spline(dataset$x, y = dataset$y)
    
    newx <- seq(0, 10, len = 100)
    newvals <- predict(spline_result, x = newx)
    plot(newvals, type = 'l')
    points(dataset)
    

    Created on 2021-12-09 by the reprex package (v2.0.1)

    The predict() function avoids the rounding error by avoiding the power basis for the polynomials.

    If you really want the polynomial coefficients, one way to get them is to use polynomial regression on the predictions. For example, to find the coefficients for the segment between 3 and 4, you could use

    lm(y ~ poly(x, degree = 3, raw = TRUE), data = predict(spline_result, x = seq(3, 4, len = 10)))
    #> 
    #> Call:
    #> lm(formula = y ~ poly(x, degree = 3, raw = TRUE), data = predict(spline_result, 
    #>     x = seq(3, 4, len = 10)))
    #> 
    #> Coefficients:
    #>                      (Intercept)  poly(x, degree = 3, raw = TRUE)1  
    #>                          26.3378                          -14.9943  
    #> poly(x, degree = 3, raw = TRUE)2  poly(x, degree = 3, raw = TRUE)3  
    #>                           3.2874                           -0.2059
    

    Created on 2021-12-09 by the reprex package (v2.0.1)

    This gives the polynomial 26.3378 -14.9943 x + 3.2874 x^2 -0.2059 x^3.