rpropensity-score-matching

Balance table with frequencies and proportions after weighting


In many scientific papers, covariate balance is presented in Table 1 before and after weighting.
Continuous variables, for example, are presented using the mean and the standard deviation, and binary variables using frequency and proportions.

I do not know how to conveniently display the frequency and proportions after weighting as bal.tab(..., disp = c("means", "sds") is limited to "means" and "sds"

Example data:

library(cobalt)
library(dplyr)

set.seed(123)
lalonde <- cbind(lalonde,
                 event = sample(c(0,1), size=614, replace=TRUE, prob=c(0.84,0.16)),
                 time = runif(614, min=10, max=365))

formula <- treat ~ age + educ + race + married + nodegree + re74 + re75 + re78

# PS
lalonde$pscore <- glm(formula, data = lalonde,
                      family = binomial(link = "logit"))$fitted.values

# Calculate weights
lalonde$weight <- ifelse(lalonde$treat == 1,
                         pmin(lalonde$pscore, 1 - lalonde$pscore) / lalonde$pscore,
                         pmin(lalonde$pscore, 1 - lalonde$pscore) / (1 - lalonde$pscore))

I used bal.tab to display means and sds for continous variables. IMO, for binary variables, the weighted means (e.g. M.0.Adj) is a weighted rate.

bal.tab(formula, data = lalonde, thresholds = c(m = .1), un = TRUE, disp = c("means", "sds"), weights = lalonde$weight)

which results, for example, in:

             Type         M.0.Adj  SD.0.Adj   M.1.Adj  SD.1.Adj Diff.Adj
married      Binary       0.2475         .    0.2581         .   0.0105

Is there a solution to derive the frequency and proportions for binary variables?


Solution

  • The weighted mean of a binary variable is the weighted proportion of units with that characteristic. It doesn't make sense to request a weighted frequency. I describe why in this answer. There is no principled way to do it, and it is not useful information anyway, so you should not attempt to report it. I don't understand what you mean by saying the weighted mean is a weighted rate and not a weighted proportion. Use bal.tab() and report the weighted mean as a weighted proportion. This is best practice and what all papers that use IPW do.