rfunctionr-mice

R+mice: I want to write a function for extracting information from a mice mids object after imputation


I have a mice mids object with approx 300 variables and I want to write a function to get information, as a simple example, Fraction of Missing Information (FMI) for each variable in a group of variables without writing out the fitting steps one variable at a time. I can write functions that return 'something' that looks like mice pooled estimate output but not the same results I get in open code, i.e., one variable at a time.

I think the following is a reproducible example.

### some blank lines deleted
library(mice)

## <masking warnings>

### simple example from JSS article on mice 
imp <- mice(nhanes, seed = 23109) 
 iter imp variable
  1   1  bmi  hyp  chl
  1   2  bmi  hyp  chl
  1   3  bmi  hyp  chl
  1   4  bmi  hyp  chl
  1   5  bmi  hyp  chl
  .....
### standard way to fit and get FMI plus other info
fit <- with(imp, lm(chl ~ 1))
print(pool(fit))
Class: mipo    m = 5 
         term m estimate     ubar       b       t dfcom       df       riv    lambda       fmi
1 (Intercept) 5  **192.944** 65.01424 5.91888 72.1169    24 19.10544 0.1092477 0.0984881 **0.1800528**

silly function I wrote

why.wont.this.work <- function(dsn, varname){
  pars <- as.list(match.call()[-1]) ### match call dsn
 var2use <- dsn$data[, as.character(pars$varname)]
 fit <- with(dsn,  lm(var2use ~ 1))
 return(pool(fit))  
}
why.wont.this.work(imp,  chl)
Class: mipo    m = 5 
         term m estimate     ubar b        t dfcom       df riv lambda       fmi
1 (Intercept) 5    **191.4** 136.2933 0 136.2933    14 12.35171   0      **0 0.1302787**

A check by hand indicates the value for the intercept really is 192.944 and the FMI really is 0.18. Somehow the silly function recognizes that the imp object is a multiple imputation object but the calculations are not correct. My question is how do I write a function that returns mathematically correct values?


Solution

  • This was quite tricky because of the different ways that with and lm() are using environments/evaluating objects. This seems to work:

    myfun <- function(dsn, varname) {
       fit <- eval(bquote(with(dsn, lm(.(substitute(varname)) ~ 1))))
       return(pool(fit))
    }
    myfun(imp, chl)
    

    (I tried to use reformulate("1", response = deparse(substitute(varname))) to construct the lm formula, but that doesn't work — for reasons I couldn't easily figure out ...)

    Note that it's still tricky to use with() with lm() even outside of a function, and even when the example doesn't involve mice ("rodent free"):

    ## don't need substitute() since we're not processing a formula argument
    varname <- quote(chl)
    form <- reformulate("1", response = deparse(varname))
    try(with(mice::nhanes, lm(form))) ## object 'chl' not found
    eval(bquote(with(mice::nhanes, lm(.(varname) ~ 1))))  ## works
    

    However, the data= argument to lm does get around this problem ...

    lm(form, data = mice::nhanes)