I have generated 10 model summaries after fitting the same model on 10 different subsets of the dataset which is as follows
library(mice)
data("nhanes")
head(nhanes)
imp <- mice(nhanes, print = FALSE, m = 10, seed = 24415)
df <- complete(imp, "long")
model_fit <- lapply(1:10, function(i) {
model = lm(bmi ~ age + hyp + chl,
data = subset(df, `.imp`==i))
})
From this I get different ggpredict
objects
ggpredict(model_fit[[1]], c("age", "hyp"))
ggpredict(model_fit[[2]], c("age", "hyp"))
ggpredict(model_fit[[3]], c("age", "hyp"))
ggpredict(model_fit[[4]], c("age", "hyp"))
ggpredict(model_fit[[5]], c("age", "hyp"))
ggpredict(model_fit[[6]], c("age", "hyp"))
ggpredict(model_fit[[7]], c("age", "hyp"))
ggpredict(model_fit[[8]], c("age", "hyp"))
ggpredict(model_fit[[9]], c("age", "hyp"))
ggpredict(model_fit[[10]], c("age", "hyp"))
I am looking for an efficient way to a) Estimate the average of all the ggpredict
objects by hp and age
The expected output would look like this.
age hp Predicted 95% C.I
--------------------------------------------------------
1 1 (28.38 + 29.35 + 27.3...)/10 (26.67 +2 6.83 + 25.25...)/10 ; (30.08 + 31.87 + 29.35....)/10
2 1 (24.21 + 26.01 + 25.40...)/10 (22.71 + 23.56 + 23.54...)/10 ; (25.71 + 28.46 + 27.26....)/10
3 1 (20.05 + 22.67 + 23.51...)/10 (17.13 + 17.89 + 20.08..)/10 ; (22.96 + 27.44 + 26.94...)/10
1 2 (31.82 + 29.35 +28.87...)/10 (28.07 + 23.58 + 24.24...)/10 ; (35.58 + 35.13 + 33.49....)/10
2 2 (27.66 + 26.01 +26.97...)/10 (24.88 + 22.02 + 23.43...)/10 ; (30.43 + 30.00 + 30.52....)/10
3 2 (23.49 + 22.67 +25.08...)/10 (20.63 + 18.73 + 21.50...)/10 ; (26.35 + 26.61 + 28.66....)/10
b) Plot based on the final averaged values using ggplot function.
So far I tried storing the results from each ggpredict
function as list object and
`Reduce(`+`, list_ggpred)/length(list_ggpred)`
I got warning,
" In Ops.factor(left, right) : `+1 not meaningful for factors.
Any suggestions highly appreciated. Thanks.
Simply use pool_predictions()
:
library(ggeffects)
# example for multiple imputed datasets
data("nhanes2", package = "mice")
imp <- mice::mice(nhanes2, printFlag = FALSE)
predictions <- lapply(1:5, function(i) {
m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i))
predict_response(m, "age")
})
pool_predictions(predictions)
#> # Predicted values of bmi
#>
#> age | Predicted | 95% CI
#> --------------------------------
#> 20-39 | 30.09 | 28.17, 32.02
#> 40-59 | 24.60 | 21.48, 27.72
#> 60-99 | 21.75 | 18.24, 25.26
#>
#> Adjusted for:
#> * hyp = no
#> * chl = 196.32
# and:
# pool_predictions(predictions) |> plot()
Created on 2024-04-30 with reprex v2.1.0
Note that pool_predictions()
account for the missing values / multiple imputations and adjusts the standard error, thus you get slightly larger confidence intervals than just taking the mean.