I am using MGCV to generate predictions for a GAM using a cubic spline forced through a certain point. I have generated the gam exactly as outlined in this post: https://stat.ethz.ch/pipermail/r-help/2013-March/350253.html
However, when I try to predict y for new values of x within the range of the gam, I get this error: Error: variable 'X' was fitted with type "nmatrix.8" but type "numeric" was supplied.
Here is the script I’m using to try to make the predictions:
## Fake some data...
library(mgcv)
set.seed(0)
n <- 100
x <- runif(n)*4-1;x <- sort(x);
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y)
dat <- data.frame(x=x,y=y)
## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]]
## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X <- sm$X[,-3] ## spline basis
S <- sm$S[[1]][-3,-3] ## spline penalty
off <- y*0 + .6 ## offset term to force curve through (0, .6)
## fit spline constrained through (0, .6)...
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))
library(tidyverse)
# Predict values across concentrations within measured range
x_seq <- seq(-1, 3, length.out = 10000)
newdata <- data.frame(X = x_seq)
# Generate the spline basis matrix for the new data
sm_new <- smoothCon(s(X, k=9, bs="cr"), newdata, knots=knots)[[1]]
X_new <- sm_new$X[,-3] # Match the basis matrix structure used in the model
# Create a data frame with the spline basis matrix
X_new_df <- as.data.frame(X_new)
# Predict the offset term for the new data
off_new <- y*0 + .6 # Use the same offset as in the original model
# Combine the new data with the basis matrix and the offset
predict_data <- cbind(newdata, X_new_df, off_new) %>%
rename(off = off_new)
# Predict values using the GAM
predictions <- predict(b, newdata = predict_data, se.fit = TRUE)
I have also tried to put the variables required for a function into a list so they can be called as an argument to ‘data’, but this creates a separate issue wherein the matrix (X) is not recognised: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 27, 8
Here is the adjusted script I use to build this model, which generates the error:
list.dat <- list(y = y,
X = X,
S = S,
off = off)
b <- gam(y ~ X - 1 + offset(off),
data = list.dat, paraPen=list(X=list(S)))
Can you help me generate predictions for y along more values of x than in the original data? I would prefer to give my variables to the model as a list in the data argument, but I am open to other suggestions too.
You can now do this directly with s()
via its pc
argument (for "point constraint"):
library(mgcv)
set.seed(0)
n <- 100
x <- runif(n) * 4 - 1
x <- sort(x)
f <- exp(4 * x) / (1 + exp(4 * x))
y <- f + rnorm(100) * 0.1
dat <- data.frame(x = x, y = y)
m <- gam(y ~ s(x, pc = 0), data = dat, method = "REML")
Then predict()
etc works as expected:
library("gratia")
ds <- data_slice(m, x = evenly(x))
fv <- fitted_values(m, data = ds) # calls predict
fv
# A tibble: 100 × 6
.row x .fitted .se .lower_ci .upper_ci
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 -0.946 0.0364 0.0526 -0.0666 0.140
2 2 -0.907 0.0375 0.0463 -0.0534 0.128
3 3 -0.867 0.0386 0.0406 -0.0410 0.118
4 4 -0.828 0.0402 0.0358 -0.0299 0.110
5 5 -0.788 0.0424 0.0319 -0.0201 0.105
6 6 -0.749 0.0455 0.0292 -0.0118 0.103
7 7 -0.709 0.0497 0.0277 -0.00467 0.104
8 8 -0.670 0.0554 0.0272 0.00202 0.109
9 9 -0.630 0.0627 0.0273 0.00923 0.116
10 10 -0.591 0.0720 0.0276 0.0179 0.126
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows
And we can confirm the point constraint via a plot of the smooth:
draw(m) # or plot(m)