I conducted multiple imputation using the 'mice' package in R. Afterwards, I calculated pooled regression analyses using the 'with' and 'pool' functions.
For further analyses, I only want to look at a specific subsample of the data. I would like to use the imputed data with pooled regression analysis for that aswell.
However, I am struggling to find a way to achieve that. That is because pooled regression analysis in 'mice' works by using the 'with' and 'lm' function on a object of class 'mids', instead of just calling 'lm' on a dataframe. Therefore, I can't just subset the data by conventional means, such as using square brackets or the 'subset' function.
I know that I could theoretically just extract the imputed datasets using the 'complete' function, conduct regression analyses on these datasets, and then pool the results by hand, but I would like to avoid that.
An example of what I want to do would be:
library(mice)
data <- as.data.frame(matrix(data = c(3, 2, 3, 4, 5, NA, 7, 10, 9, NA, NA, 12, 13, 14, 15, 16, NA, 18), nrow = 6))
names(data) <- c("a", "b", "c")
data$Sex <- c("male", "male", "female", "male", "female", "female")
imp <- mice(data = data,
m = 20,
maxit = 10,
seed = 12,
print = FALSE)
Now, I can conduct pooled regression analysis by using:
summary(pool(with(imp, lm(a ~ b + c))))
What I am struggling to achieve is conducting a regression analysis on only the male subjects.
mice
returns an object of class mids, which can be subsetted with a boolean vector using filter
:
filter(imp, Sex %in% "male")
# or for more detail
imp_filtered <- filter(imp, Sex %in% "male")
imp_filtered$data
# a b c Sex
#1 3 7 13 male
#2 2 10 14 male
#4 4 NA 16 male
So to implement this, you can save a filtered object or modify your code slightly:
# save filtered data to new object
imp_filtered <- filter(imp, Sex %in% "male")
summary(pool(with(imp_filtered, lm(a ~ b + c))))
# or all in one go
summary(pool(with(filter(imp, Sex %in% "male"), lm(a ~ b + c))))