Suppose you are working on a regression model and at least one of the predictors is estimated via splines, e.g.,
library(splines)
data(diamonds, package = "ggplot2")
fit <- lm(price ~ bs(depth, degree = 5) + bs(carat, knots = c(2, 3)) * color,
data = diamonds)
The above fit is for illustrative purposes and has no meaningful reason for being.
Now, lets keep the same basic formula, but change the knot locations for both depth and carat. The update needs to take place in a dynamic way such that it could be part of a bigger MCMC method (number of knots and knot locations to be determined either by either a reversible jump or birth/death step).
I'm well aware of the update
and update.formula
calls, but I don't believe
that these tools will help. The following pseudo code should illustrate the
behavior of the function I'm planning to develop.
foo <- function(formula, data) {
# Original Model matrix, the formula will be of the form:
Xmat_orig <- model.matrix(formula, data)
# some fancy method for selecting new knot locations here
# lots of cool R code....
# pseudo code for the 'new knots'. In the example formula above var1 would be
# depth and var2 would be carat. The number of elements in this list would be
# dependent on the formula passed into foo.
new_knots <- list(k1 = knot_locations_for_var1,
k2 = knot_locations_for_var2)
# updated model matrix:
# pseudo code for that the new model matrix call would look like.
Xmat_new <-
model.matrix(y ~ bs(var1, degree = 5, knots = new_knots$k1) + bs(var2, knots = new_knots$k2) * color,
data = data)
return(Xmat_new)
}
Can someone suggest a way to modify the knots
call within either a bs
or
ns
call dynamically?
Here's another possibility that isn't as picky about what is input to your function. Consider this
newknots <- function(form, data, calls=c("bs","ns")) {
nk <- function(x) {
sort(runif(sample(1:5, 1), min = min(data[[x]]), max = max(data[[x]])))
}
rr <- function(x, nk, calls) {
if(is.call(x) && deparse(x[[1]]) %in% calls) {
x$knots = nk(deparse(x[[2]]))
x
} else if (is.recursive(x)) {
as.call(lapply(as.list(x), rr, nk, calls))
} else {
x
}
}
z <- lapply(as.list(form), rr, nk, calls)
z <- eval(as.call(z))
environment(z) <- environment(form)
z
}
So this isn't exactly a trivial function, but hopefully it's not too bad. The idea is that we can convert a formula to a list object that we can recursively investigate. That's what the internal rr
function is doing. It takes a list, and then looks at each of the elements. It looks for calls to bs
or ns
and when it finds them, it replaces the knots=
parameter.
Here we use the kn
function to create a new set of knots for a given variable name which is passed in as a string. We simply need to return a list of values appropriate for that variable.
Finally I need to turn the list back into a formula and I make sure our new object has the same environment as the original formula. So this actually does create a new formula object leaving the original intact (you may replace the original value if you wish).
Here's an example of how you would call/use this function.
f <- price ~ ns(carat, knots=c(1,3)) * color + bs(depth, degree = 5) + clarity
newknots(f, diamonds)
# price ~ ns(carat, knots = c(2.09726121873362, 3.94607368792873
# )) * color + bs(depth, degree = 5, knots = c(44.047089480795,
# 47.8856966942549, 49.7632855847478, 70.9297015387565)) + clarity
So you can see that the knots were added and replaced according to our new rule. I'm not sure what other features you may need, but hopefully this will give you a good starting point.