I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).
Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE)
. This is my starting point.
Here's the code to reproduce this dataset
library(mice)
library(rms)
library(parameters)
library(splines)
set.seed(123) # For reproducibility
# Number of observations
n <- 100
# outcome
y <- rbinom(n, size = 1, prob = 0.07)
# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values
# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0
# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values
# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)
# View first rows of the dataset
head(df)
#
imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)
imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)
The goal is to estimate the Odds Ratio of an event (y).
I am modelling tiktok_ban
as nonlinear spline with jump at day 20
d <- imputed_df_l
dd <- datadist(d); options(datadist='dd')
Hmisc::describe(imputed_df_l$tiktok_ban)
k <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k
h <- function(x) {
z <- cbind(rcspline.eval(x, k),
jump=x >= 20)
attr(z, 'nonlinear') <- 2 : ncol(z)
z
}
I am able to estimate the OR for this model on individual imputed dataset
f <- lrm(y ~ child_sex + bmi +
gTrans(tiktok_ban, h),
data= subset(imputed_df_l, .imp == 1))
summary(f)
How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?
Please note I cannot change the imputed datasets imputed_df_l
, this is what I was given. Thanks in advance for any help
I have tried fit.mult.impute
but ran into errors and unable to make it work for my usecase.
I've done something similar there. It follows Rubin's logic1, 2.
> PF <- by(imputed_df_l, ~ .imp, \(x) {
+ summary(lrm(y ~ child_sex + bmi + gTrans(tiktok_ban, h), data=x))
+ }) |>
+ simplify2array()
>
> m. <- length(unique(imputed_df_l$.imp))
> Q <- rowMeans(PF[, 'Effect', ]) ## calculate mean estimates
> U <- rowMeans(PF[, 'S.E.', ]) ## calculate within variances
> B <- rowSums(((PF[, 'Effect', ] - Q)^2))/(m. - 1) ## calculate between variances
> T <- U + (1 + 1/m.)*B ## calculate total variances
> cbind(Estimate=Q, 'Std. Error'=sqrt(T))
Estimate Std. Error
bmi -0.2392742 0.8981817
Odds Ratio 0.8175118 NA
tiktok_ban 9.2942707 2.8305722
Odds Ratio 10985.6518883 NA
child_sex - Male:Female 0.6758367 0.9786733
Odds Ratio 1.9660930 NA
I leave the CIs and the plot to you.