First of all: This question might be a duplicate/already solved of/in this stackoverflow post.
I want to use the MatchIt
package to perform fully-blocked matching in my dataset using the Mahalanobis distance. I have two observed covariates (age and sex) that I want to use for matching.
I know that I can perform mahalanobis-based matching using the following arguments:
formula <- as.formula("group ~ sex_boolean + age")
m.out <- matchit(formula=formula,
data=data_df,
distance='mahalanobis')
site_df_matched <- get_matches(m.out,data=data_df)
But this only performs mahalanobis-based matching using the nearest neighbor. What if I want to go even more strict? Is it possible to introduce a caliper to mahalanobis-matching? The idea would be the following: For each unit in the minority group find a unit in the majority group to which the mahalanobis distance is smallest and lies within a defined caliper. If there is no unit from the majority group, the respective unit from the minority group should be discarded.
The outcome should be treatment and control groups of equal sizes containing pairs of units that are close in the respective covariates. The 'closeness' should be controllable by how strict the caliper is set. More strict calipers would lead to more discarded units from the minority group.
Maybe I am also having a false understanding of the mahalanobis-based matching procedure, but is it possible (and recommended) to do this with MatchIt
?
Yes, this is straightforward using MatchIt
version 4.0.0 and greater. If you want to match on the Mahalanobis distance but include a propensity score caliper, the distance
argument needs to correspond to the propensity score and the mahvars
argument controls on which covariates Mahalanobis distance matching is performed. For example, to perform Mahalanobis distance matching on sex
and age
after estimating a propensity score that contained other variables (e.g., race
and educ
) in addition to these two, you would run the following code:
m.out <- matchit(treat ~ age + sex + race + educ, #variables used in PS
data = data_df, #dataset
distance = "glm", #method of estimating PS
caliper = .25, #width of caliper on PS
mahvars = ~ age + sex) #variables used in Mahalanobis distance
If you want to perform Mahalanobis distance matching without involving a propensity score, the code below accomplishes that:
m.out <- matchit(treat ~ age + sex,
data = data_df,
distance = "mahalanobis")
If you need to estimate a propensity score for any reason (e.g., a caliper or common support), you must use the first syntax. If no propensity score is involved, the second syntax works. You can still place calipers on the pairs with the second syntax as long as the calipers are on other supplied variables; for example, to place a caliper of .25 standard deviations of age
, you could enter caliper = c(age = .25)
. You can place calipers on multiple variables at a time, including the propensity score if the first syntax is used.
This is all detailed in the help page for nearest neighbor matching, which can be reviewed here or with ?method_nearest
.