rloopsimputationr-mice

With MICE package, how to create a list of models from a list of variables to test with glm


I want to do t-test(or chi^2 test) to estimate the difference of variables between grou=0 and grou=1. All variables in the dataset are imputed by MICE. Variables include AGE,SCORE,GENDER, HEART, etc.; AGE and SCORE are continuous variables, and GENDER and HEART are categorical variables.

If the t-test is done for only one variable at a time, I know the code is:

library(MICE)
data_im<-mice(data, m=5,seed=6666)
summary(pool(with(data_im,glm(AGE~grou))))

The output p-value is also the p-value of the t-test.

However, there are too many variables I need to evaluate,thus I would like to write a for loop or create a function to output the summary test results of multiple variable at once.

I have tried to write:

vars <- c("AGE","SCORE","GENDER","HEART")
afterMICE <- c()
for(i in 1:4){
  pool_fitMICE <- pool(with(data_im,glm(substitute(y ~ grou,list(y=as.name(vars[i]))))))
}

**Error in eval(predvars, data, env) : object 'AGE' not found**

afterMICE <- rbind(afterMICE,C(vars[i],coef(summary(pool_fitMICE))[2,c(1,2,4)]))

I know the reason for the error is that data_im is not a regular dataframe structure.

How to modify the code to achieve batch output of summary results from different variables?

#------------------------------------------------------------------------------

Edit: mice() is a function to do multivariate imputation for missing data. https://www.rdocumentation.org/packages/mice/versions/3.15.0/topics/mice

Take the data("nhanes2") for example and we can see the structure of data_im.

library(MICE)
data("nhanes2")
vars=c("bmi","chl","age","hyp")
catvars=c("age","hyp")
data_im=mice(nhanes2,m=5,seed=6666)
pool.fits <- pool(with(data_im, glm(age~hyp)))

But we need to batch pool the results of pool(with(data_im, glm(vars~hyp))) (Variables in vars take turns being a dependent variable in glm(), with hyp as the independent variable. )


Solution

  • You actually don't need to mess around with eval and substitute in this situation. You can create the formula normally (using as.formula for example) and when the formula is evaluated inside with it will automatically find the variables in the imputed data.

    library(mice)
    set.seed(123)
    vars <- c("age", "bmi", "chl")
    imps <- mice(nhanes, printFlag = FALSE)
    mods <- list()
    for(i in seq_along(vars)) {
      mods[[i]] <- pool(with(imps, 
                             glm(as.formula(paste0(vars[[i]], "~ hyp")))))
    }
    names(mods) <- vars
    lapply(mods, summary)
    #> $age
    #>          term  estimate std.error statistic       df   p.value
    #> 1 (Intercept) 0.5216374 0.4873196  1.070422 17.71793 0.2987954
    #> 2         hyp 1.0163241 0.3744249  2.714360 19.13894 0.0136971
    #> 
    #> $bmi
    #>          term   estimate std.error   statistic       df      p.value
    #> 1 (Intercept) 27.0037636  2.882558  9.36798555 16.45801 5.297937e-08
    #> 2         hyp -0.1502389  2.191507 -0.06855506 18.35948 9.460849e-01
    #> 
    #> $chl
    #>          term  estimate std.error statistic       df      p.value
    #> 1 (Intercept) 157.88698  24.65957  6.402665 20.91373 2.445428e-06
    #> 2         hyp  29.06627  19.62684  1.480945 19.90150 1.542804e-01
    

    Created on 2023-03-08 with reprex v2.0.2


    Edit: Putting output into a data.frame.

    out <- data.frame(var = vars,
                      coef_int = 0,
                      coef_hyp = 0)
    for(i in seq_along(vars)) {
      out[i, 2:3] <- mods[[i]]$pooled$estimate
    }
    out
    #>   var    coef_int   coef_hyp
    #> 1 age   0.5216374  1.0163241
    #> 2 bmi  27.0037636 -0.1502389
    #> 3 chl 157.8869758 29.0662740