I'm trying to impute values from a dataset using hmisc. I'm following this guide.
Here is a reproducible example of my code:
#Create dataset and add 0.1 NA values randomly
data <- iris
library(missForest)
library(Hmisc)
iris.mis <- prodNA(iris, noNA = 0.1)
#Calculating imputed values with aregImpute
impute_arg <- aregImpute(~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width + Species, data = iris.mis, n.impute = 5)
completeData2 <- impute.transcan(impute_arg, imputation=1, data=iris.mis, list.out=TRUE,pr=FALSE, check=FALSE)
head(completeData2)
#creating a fit model
library(rms)
fmi <- fit.mult.impute(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, ols, impute_arg, data=iris.mis)
My question is: How do I apply this fit model to my data and impute the NA values in my dataset (iris.mis)?
Answers with code snippets would be greatly appreciated.
All you need to do is get the model's predictions:
model_predictions <- predict(fmi)
Now you can examine the predictions at the data's missing indices:
missing <- which(is.na(iris.mis$Sepal.Length))
imputed <- model_predictions[missing]
imputed
#> 5 22 27 32 34 35 54 60
#> 5.073695* 5.119113* 5.182343* 4.949794* 5.381427* 4.863149* 5.565716* 5.596861*
#> 89 102 107 117 131 135 145 149
#> 5.950823* 6.217764* 5.757642* 6.829916* 7.116657* 6.726274* 6.738296* 6.662452*
#> 150
#> 6.428420*
And see how they compare to the actual values:
actual <- iris$Sepal.Length[missing]
plot(x = actual, y = imputed, xlim = c(4, 8), ylim = c(4, 8), col = "red",
xlab = "Actual", ylab = "Imputed", main = "Imputed vs Actual Sepal Length")
lines(c(4, 8), c(4, 8), lty = 2)
#> # calculate residuals
imputed - actual
#> 5 22 27 32 34 35
#> 0.07369483* 0.01911295* 0.18234346* -0.45020634* -0.11857279* -0.03685114*
#> 54 60 89 102 107 117
#> 0.06571631* 0.39686061* 0.35082282* 0.41776385* 0.85764178* 0.32991602*
#> 131 135 145 149 150
#> -0.28334270* 0.62627448* 0.03829600* 0.46245174* 0.52842038*
#>
#> # sum of squared errors
sum((imputed - actual)^2)
#> [1] 2.52802
So, if you want a new column in your set complete with the imputations you can do
iris.mis$Sepal.Length.Imputed <- iris.mis$Sepal.Length
iris.mis$Sepal.Length.Imputed[is.na(iris.mis$Sepal.Length.Imputed)] <- imputed