In the following example, I create a function to fit a glm
, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:
n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df
fun1 <- function(mod, pTrain = 0.5){
print(environment())
data <- mod$data
y <- mod$y
train <- sample(nrow(data), size = nrow(data)*pTrain)
valid <- -train
modTrain <- update(object = mod, data = data[train,])
yhat <- predict(modTrain, newdata = data[valid,])
res <- data.frame(y = y, yhat = yhat)
return(res)
}
fun2 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
# environment(fmla) <- environment() # does not help
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found
If I use a <<-
assignment for the formula, the function works, but I worry about the potential issues with this:
fun3 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2
Inspired by this post - in particular, the answer that has not been accepted - this seems to solve the problem.
fun1 <- function(mod, pTrain = 0.5){
data <- mod$data
y <- mod$y
train <- sample(nrow(data), size = nrow(data)*pTrain)
valid <- -train
# New code
ev <- environment()
parent.env(ev) <- environment(mod$formula)
environment(mod$formula) <- ev
# End of new code
modTrain <- update(object = mod, data = data[train,])
yhat <- predict(modTrain, newdata = data[valid,])
res <- data.frame(y = y, yhat = yhat)
return(res)
}
I cannot explain why, though the discussion in the accepted answer above is probably worth some study.
As I mentioned in my comment, amending the signature of fun1
to
fun1 <- function(mod, pTrain = 0.5, fmla)
and the call to it in fun2
to
res <- fun1(mod, pTrain = 0.5, fmla)
also succeeds.