rsurveyimputationvariancer-mice

Getting within and in-between variances and lambda ratio with MIcombine?


I'm pooling estimates from a raking weighting procedure on multiple imputed data and I'm interested in describing how the variability from the imputation by chained equation carry over my estimates. I want to do this by reporting U-hat (i.e.: the average within variance of the estimate), B (the in-between variance or the extra variance caused by the fact that there is missing data in the sample) and λ (i.e.: the proportion of total variance due to missingness or the ratio of the B on the total variance).

These parameters are usually easily accessible with the pool() function from mice package. However, I'm using MIcombine because it is more convenient if we want to apply raking procedure to multiple imputed data.

So my question is : how can I get the within and in-between variances and the lambda ratio from MIcombine?

Here is a reproducible code.

library(mitools)
library(survey)
library(mice)

data(nhanes)

nhanes2$hyp <- as.factor(nhanes2$hyp)

imp <- mice(nhanes2,method=c("polyreg","pmm","logreg","pmm"), seed = 23109)

imp_list <- lapply( 1:5 , function( n ) complete( imp , action = n ) )


des<-svydesign(id=~1, data=imputationList(imp_list))
small.svy.rake<-des


age.dist <- data.frame(age =  c("20-39","40-59", "60-99"),
                       Freq = nrow(des) * c(0.5, 0.3, .2))

# loop through each of the implicates
# applying the `rake` function to each
small.svy.rake$designs <- 
  lapply( 
    des$designs , 
    rake ,
    sample.margins = list(~age),
    population.margins = list(age.dist)
  )

# new mean bmi estimated after raking with pooled standard error
summary(MIcombine( with( small.svy.rake , svymean( ~ bmi ) ) ))

Within standard error can be extract from the list of raking operations on the imputed data

rake.list<-with( des , svymean( ~ bmi ) )#create a list object

datalong<-do.call(rbind, lapply(rake.list , function(x) data.frame(x)))#transform into long format data frame

mean(datalong[,2]) #mean standard error=within standard error

However, I don't know how to obtain the two other components, i.e.: the in-between variance and the lambda parameter and would enjoy any help to get closer to that goal.


Solution

  • Following @Thomas Lumley comment, here are the adjustments to print the in-between variance and the lambda ratio:

    MIcombine_addparam<-function (results, variances, call = sys.call(), df.complete = Inf, 
              ...) 
    {
      m <- length(results)
      oldcall <- attr(results, "call")
      if (missing(variances)) {
        variances <- suppressWarnings(lapply(results, vcov))
        results <- lapply(results, coef)
      }
      vbar <- variances[[1]]
      cbar <- results[[1]]
      for (i in 2:m) {
        cbar <- cbar + results[[i]]
        vbar <- vbar + variances[[i]]
      }
      cbar <- cbar/m
      vbar <- vbar/m
      evar <- var(do.call("rbind", results))
      r <- (1 + 1/m) * evar/vbar
      df <- (m - 1) * (1 + 1/r)^2
      if (is.matrix(df)) 
        df <- diag(df)
      if (is.finite(df.complete)) {
        dfobs <- ((df.complete + 1)/(df.complete + 3)) * df.complete * 
          vbar/(vbar + evar)
        if (is.matrix(dfobs)) 
          dfobs <- diag(dfobs)
        df <- 1/(1/dfobs + 1/df)
      }
      if (is.matrix(r)) 
        r <- diag(r)
      rval <- list(coefficients = cbar, variance = vbar + evar * 
                     (m + 1)/m,B=evar,lambda=r/(r+1), call = c(oldcall, call), nimp = m, df = df, 
                   missinfo = (r + 2/(df + 3))/(r + 1))
      class(rval) <- "MIresult"
      rval
    }
    

    In-between variance is computed directly by the function so we retain evar as B in the output. For the proportion of variance due to missingness, the function computes the ratio r, which is the relative increase in variance due to nonresponse. Following Van Buuren, it is linked to lambda by r=λ/(1−λ), so can be solved for lambda by λ=r/(r+1).