rimputationhmisc

R - Getting Imputed Missing Values back into dataframe


I'm using aregImpute to impute missing values on a R dataframe (bn_df).

The code is this:

library(Hmisc)
impute_arg <- aregImpute(~ TI_Perc + AS_Perc + 
                         CD_Perc + CA_Perc + FP_Perc, 
                         data = bn_df, n.impute = 5)

It works fine.

The problem is after. In putting the values back into the original dataframe.

I can do it, just not in a very elegant way. I basically have to copy/paste the following line for all columns:

bn_df$CD_Perc[impute_arg$na$CD_Perc] <- impute_arg$imputed$CD_Perc[,1]
bn_df$FP_Perc[impute_arg$na$FP_Perc] <- impute_arg$imputed$FP_Perc[,1]
...

This works. But there has to be a more efficient way to accomplish this without copy/paste for all columns.

Any ideas?


Solution

  • You can use function impute.transcan. Since you have not provided the data, I have copied an example from aregImpute's documentation.

    # The data
    x1 <- factor(sample(c('a','b','c'),1000,TRUE))
    x2 <- (x1=='b') + 3*(x1=='c') + rnorm(1000,0,2)
    x3 <- rnorm(1000)
    y  <- x2 + 1*(x1=='c') + .2*x3 + rnorm(1000,0,2)
    orig.x1 <- x1[1:250]
    orig.x2 <- x2[251:350]
    
    # Insert NAs
    x1[1:250] <- NA
    x2[251:350] <- NA
    
    # Create a data frame 
    d <- data.frame(x1,x2,x3,y)
    # Find value of nk that yields best validating imputation models
    # tlinear=FALSE means to not force the target variable to be linear
    
    # Use imputation 
    f <- aregImpute(~y + x1 + x2 + x3, nk=c(0,3:5), tlinear=FALSE,
                    data=d, B=10) # normally B=75
    
    # Get the imputed values
    imputed <-impute.transcan(f, data=d, imputation=1, list.out=TRUE, pr=FALSE, check=FALSE)
    
    # convert the list to the database
    imputed.data <- as.data.frame(do.call(cbind,imputed))
    
    # arrange the columns accordingly
    imputed.data <- imputed.data[, colnames(d), drop = FALSE]