I have a mice mids
object with approx 300 variables and I want to write a function to get information, as a simple example, Fraction of Missing Information (FMI) for each variable in a group of variables without writing out the fitting steps one variable at a time. I can write functions that return 'something' that looks like mice pooled estimate output but not the same results I get in open code, i.e., one variable at a time.
I think the following is a reproducible example.
### some blank lines deleted
library(mice)
## <masking warnings>
### simple example from JSS article on mice
imp <- mice(nhanes, seed = 23109)
iter imp variable
1 1 bmi hyp chl
1 2 bmi hyp chl
1 3 bmi hyp chl
1 4 bmi hyp chl
1 5 bmi hyp chl
.....
### standard way to fit and get FMI plus other info
fit <- with(imp, lm(chl ~ 1))
print(pool(fit))
Class: mipo m = 5
term m estimate ubar b t dfcom df riv lambda fmi
1 (Intercept) 5 **192.944** 65.01424 5.91888 72.1169 24 19.10544 0.1092477 0.0984881 **0.1800528**
why.wont.this.work <- function(dsn, varname){
pars <- as.list(match.call()[-1]) ### match call dsn
var2use <- dsn$data[, as.character(pars$varname)]
fit <- with(dsn, lm(var2use ~ 1))
return(pool(fit))
}
why.wont.this.work(imp, chl)
Class: mipo m = 5
term m estimate ubar b t dfcom df riv lambda fmi
1 (Intercept) 5 **191.4** 136.2933 0 136.2933 14 12.35171 0 **0 0.1302787**
A check by hand indicates the value for the intercept really is 192.944 and the FMI really is 0.18. Somehow the silly function recognizes that the imp object is a multiple imputation object but the calculations are not correct. My question is how do I write a function that returns mathematically correct values?
This was quite tricky because of the different ways that with
and lm()
are using environments/evaluating objects. This seems to work:
myfun <- function(dsn, varname) {
fit <- eval(bquote(with(dsn, lm(.(substitute(varname)) ~ 1))))
return(pool(fit))
}
myfun(imp, chl)
(I tried to use reformulate("1", response = deparse(substitute(varname)))
to construct the lm
formula, but that doesn't work — for reasons I couldn't easily figure out ...)
Note that it's still tricky to use with()
with lm()
even outside of a function, and even when the example doesn't involve mice
("rodent free"):
## don't need substitute() since we're not processing a formula argument
varname <- quote(chl)
form <- reformulate("1", response = deparse(varname))
try(with(mice::nhanes, lm(form))) ## object 'chl' not found
eval(bquote(with(mice::nhanes, lm(.(varname) ~ 1)))) ## works
However, the data=
argument to lm
does get around this problem ...
lm(form, data = mice::nhanes)