I conducted prospensity score matching in R using the R-package "Matching" and "Matchit" respectively, but the number of matches were completely different.
The dataset is here http://web.hku.hk/~bcowling/data/propensity.csv or http://web.hku.hk/~bcowling/examples/propensity.htm.
example <- propensity
The code using "Matching" was:
m.ps <- glm(trt ~ age + risk + severity, family="binomial", data=example)
example$ps <- predict(m.ps, type="response")
PS.m <- Match(Y=example$death, Tr=example$trt, X=example$ps, M=1, caliper=0.2, replace=FALSE)
summary(PS.m )
SE......... 0.041299
T-stat..... -2.1126
p.val...... 0.034634
Original number of observations.............. 400
Original number of treated obs............... 192
Matched number of observations............... 149
Matched number of observations (unweighted). 149
Caliper (SDs)........................................ 0.2
Number of obs dropped by 'exact' or 'caliper' 43
The number of matches was 149.
The code using "MatchIt" was:
psm<-matchit(trt ~ age+risk+severity, data=example, method="nearest",caliper=0.2)
summary(psm)
Sample Sizes:
Control Treated
All 208 192
Matched 161 161
Unmatched 47 31
Discarded 0 0
The number of matches was 161, and it was different from 149 when using Matching. Why were they different?
Two reasons: 1) Matching
proceeds through the matches in the order of units in the dataset while MatchIt
by default proceeds through matches based on descending order of the propensity score, and 2) Matching
uses a nonzero distance tolerance by default, meaning that any two units with a propensity score difference of .00001 or less will be considered exactly matched, whereas MatchIt
has no such tolerance.
To ensure the results are the same between Matching
and MatchIt
, set m.order = "data"
in matchit()
and set distance.tolerance = 0
in Match()
.
PS.m <- Match(Y=example$death, Tr=example$trt, X=example$ps, M=1, caliper=0.2, replace=FALSE, ties = F,
distance.tolerance = 0)
psm <- matchit(trt ~ age+risk+severity, data=example, method="nearest",caliper=0.2,
m.order = "data")
cobalt::bal.tab(psm, weights = PS.m)
#> Call
#> matchit(formula = trt ~ age + risk + severity, data = example,
#> method = "nearest", m.order = "data", caliper = 0.2)
#>
#> Balance Measures
#> Type Diff.matchit Diff.Match
#> distance Distance 0.0043 0.0043
#> age Contin. 0.0902 0.0902
#> risk Contin. -0.0348 -0.0348
#> severity Contin. -0.0342 -0.0342
#>
#> Effective sample sizes
#> Control Treated
#> All 208 192
#> matchit 149 149
#> Match 149 149
Created on 2022-02-22 by the reprex package (v2.0.1)
Here I used cobalt::bal.tab()
to verify that the result matched sample sizes are the same and the balance statistics match identically, indicating the same matched sample is produced using both methods.