My main goal is conducting mediation (using the mediation
R package) while accounting for sampling weights (using the survey
package). This led me to discover that weights are stored differently for models fit with survey::svyglm
and for models fit with survey::svyolr
.
set.seed(3459823)
library(survey)
library(srvyr)
library(tidyverse)
library(mediation)
n <- 100
# simulate data
dat <- tibble::tibble(
id = 1:100,
y_binary = sample(c(0,1), size = n, replace = TRUE),
m_binary = sample(c(0,1), size = n, replace = TRUE),
m_ordinal = sample(c(1, 2, 3), size = n, replace = TRUE, prob = c(0.2, 0.3, 0.5)) |>
as.factor() |>
as.ordered(),
pred_normal = rnorm(n = n),
weight = rnorm(n = n, mean = 15),
weight_rep1 = rnorm(n = n, mean = weight),
weight_rep2 = rnorm(n = n, mean = weight),
weight_rep3 = rnorm(n = n, mean = weight),
weight_rep4 = rnorm(n = n, mean = weight),
weight_rep5 = rnorm(n = n, mean = weight)
)
# create design object
dat_design <- dat |>
srvyr::as_survey_rep(
repweights = dplyr::contains("_rep"),
weights = weight,
combined_weights = TRUE
)
# ordinal mediator models
model_y_binary <- survey::svyglm(
formula = y_binary ~ m_ordinal + pred_normal,
design = dat_design,
family = "binomial"
)
model_m_ordinal <- survey::svyolr(
formula = m_ordinal ~ pred_normal,
design = dat_design
)
mediation_ordinal <- mediation::mediate(
model.m = model_m_ordinal,
model.y = model_y_binary,
sims = 50,
treat = "pred_normal",
mediator = "m_ordinal"
)
# Won't run!
# Error in mediation::mediate(model.m = model_m_ordinal, model.y = model_y_binary, :
# weights on outcome and mediator models not identical
I then inspected the weights saved in the model objects using the model.frame()
function, which is also what mediation::mediate
seems to use under the hood to extract the weights.
# Inspecting the weights saved by the survey models:
model.frame(model_y_binary) |> dplyr::slice_head()
# Result:
# y_binary m_ordinal pred_normal (weights)
# 1 1 3 -0.09479119 1.004946
# so, the binomial model stores the weights in a column in the model.frame
model.frame(model_m_ordinal) |> dplyr::slice_head()
# Result:
# m_ordinal pred_normal (weights)
# 1 3 -0.09479119 14.92443
# so, the ordinal model stores the weight as well, but they are different!
weight_comparison <- tibble::tibble(
binomial_weights = model.frame(model_y_binary)$`(weights`,
ordinal_weights = model.frame(model_m_ordinal)$`(weights)`,
weight_quotient = binomial_weights / ordinal_weights
)
weight_comparison |> dplyr::slice_head(n = 5)
# Result:
# binomial_weights ordinal_weights weight_quotient
# <dbl> <dbl> <dbl>
# 1 1.00 14.9 0.0673
# 2 1.03 15.2 0.0673
# 3 0.992 14.7 0.0673
# 4 1.01 14.9 0.0673
# 5 0.927 13.8 0.0673
# so, all the binomial weights seem to be equal to the ordinal weights times ~0.0673
Why does this happen?
Is there a way to alter the fit survey::svyolr
model object and force the weights to be equal to those stored in the svyglm
model object?
Could the mediation
package be altered to take this into account? For example by just taking the weights from model.m and not comparing them to model.y? (Of course this could also lead to other problems.)
Some functions in the survey package rescale the weights to have unit mean because the large weights (thousands or tens of thousands) in some national surveys can cause convergence problems.
Since the results aren't affected at all, the mediate
package could probably work around this fairly easily. However, there's a rescale=FALSE
option to svyglm
to not rescale the weights, provided for this sort of purpose.
If you then have convergence problems in svyglm
you could manually rescale the weights to have unit mean before doing any of the analyses.