rlmrms

Predicted values for new data using restricted cubic splines


I have some data that I'm modeling using restricted cubic splines. I'm using the rcs transformation function in the rms package to generate the transformed variables for a linear model. Here is an example using 5 knots.

library('rms')

my_df <- data.frame(
    y = -4 * -100:100 + -1.5 * (-100:100)**2 + 3 * (-100:100)**3 + rnorm(201, 0, 1e5),
    x = -100:100
)

mod <- lm(y ~ rcs(x, 5), data = my_df)

After I fit the data, I'd like to find the predicted y values for a specific domain of x values. Here is what I'm doing now:

new_data <- data.frame(x = -3:3)

predict(mod, newdata = new_data)

However, this generates a warning message:

Warning message:
In rcspline.eval(x, nk = nknots, inclx = TRUE, pc = pc, fractied = fractied) :
    5 knots requested with 7 unique values of x.  knots set to 5 interior values.

What does this mean, and what is going on? I expected that the knot locations should already be defined in mod, so I don't understand why it seems to be trying to find new knots for the seven x values that I give it. I can avoid the warning message by providing more x values in new_data, and just ignoring the ones I don't need, but I am concerned about what predict is actually doing.


Solution

  • According to Hadley's comment on this question you shouldn't expect lm to work with rcs. A quick demonstration why there's a problem:

    mod <- lm(y ~ rcs(x, 5), data = my_df)
    
    new_data <- data.frame(x = -3:3)
    new_data2 <- data.frame(x = -300:300/100)
    
    plot(new_data2$x, predict(mod, newdata = new_data2), type='l')
    lines(new_data$x,predict(mod, newdata = new_data), col='red')
    

    Graph produced as code output

    The predictions vary depending on the number of x-values, even for the same range, so definitely not a good idea to combine these functions.