So my situation is the following: I have a large dataframe which contains the data I should use in matching analyses. I should, however, match inside subgroups that are defined by certain areas. Because I didn't want to do that "manually" for each subgroup (there are too many), I came up with an approach that divides the initial dataframe into sub-dataframes containing information of each unique treated area and the control areas, and saves these dataframes into a list. After this, I performed matching on the dataframes in the list using matchit function from R's MatchIt package. Here a heavily simplified example of how the dataframe list looks like:
> list_df
$A
name treatment cov1 cov2 cov3 var
1 A 1 13.65933 200.5809 13 1000.1185
2 A 1 15.80334 233.8301 13 1010.1038
3 A 1 15.16098 215.1046 13 999.8548
4 A 1 16.45487 185.4957 13 997.5585
5 A 1 15.55230 193.5955 13 1001.2822
9 U 0 16.33895 175.6502 13 999.0682
10 U 0 18.05787 197.6041 13 1003.2781
11 U 0 14.29088 229.5446 13 1002.9567
12 U 0 16.32195 238.9975 13 998.9453
13 U 0 15.25240 217.5467 13 1004.0581
14 U 0 14.69154 219.9963 13 999.3270
15 U 0 14.88606 153.6038 15 989.6423
16 U 0 14.34472 212.5205 15 994.6094
17 U 0 14.66233 231.1179 15 999.7775
18 U 0 14.69155 240.4084 15 994.9280
19 U 0 15.63663 198.3323 10 1007.4225
20 U 0 15.19980 183.5846 10 997.6229
$B
name treatment cov1 cov2 cov3 var
6 B 1 15.66004 187.1542 15 1004.2311
7 B 1 13.89696 197.5548 15 995.6478
8 B 1 16.17403 204.9423 15 1001.5157
9 U 0 16.33895 175.6502 13 999.0682
10 U 0 18.05787 197.6041 13 1003.2781
11 U 0 14.29088 229.5446 13 1002.9567
12 U 0 16.32195 238.9975 13 998.9453
13 U 0 15.25240 217.5467 13 1004.0581
14 U 0 14.69154 219.9963 13 999.3270
15 U 0 14.88606 153.6038 15 989.6423
16 U 0 14.34472 212.5205 15 994.6094
17 U 0 14.66233 231.1179 15 999.7775
18 U 0 14.69155 240.4084 15 994.9280
19 U 0 15.63663 198.3323 10 1007.4225
20 U 0 15.19980 183.5846 10 997.6229
In the real data, I have seven covariates, two of which are matched using exact method.
Here code for matching combining matchit (with Mahalanobis distance) and lapply:
library(MatchIt)
m_obj_Mah <- lapply(area_list,
function(x){
matchit(Treatment ~ Cov1 + Cov2 + Cov3 + Cov4 + Cov5,
data=x, method="nearest", exact = ~ Cov6 + Cov7, distance="mahalanobis")
}
)
In the code above, everything works fine. However, when I try to extract the matched datasets, I get the error:
m_data_Mah <- lapply(m_obj_Mah,
function(x) {match.data(x)})
Error in eval(object$call$data, envir = env) : object 'x' not found
Weirdest thing here is that I used the same approach to do nearest neighbour propensity score matching with calipers in the same dataset and the error didn't appear. The error apparently has something to do with defining the function using x as a name for each df in lapply, but I can't come up with a solution (either looping through the areas in another way or defining the x in lapply somehow differently). Any suggestions?
And sorry that I didn't provide any data. It would be quite complicated to generate a realistic dataset and I cannot share the original. I can try to come up with some kind of a dummy dataset if it's absolutely necessary.
Please see this issue, which asks the same question, and the documentation for match.data()
, which answers it (see the data
argument).
This is an inherent limitation of match.data()
, but the solution is simple and documented: supply the original dataset to the data
argument of match.data()
, as so:
m_data_Mah <- lapply(seq_along(area_list), function(i) {
match.data(m_obj_Mah[[i]], data = area_list[[i]])}
If you are using version 4.2.0 or higher of MatchIt
, using exact
will automatically match within subgroups of the exact matching variables (i.e., it will perform separate matching procedures within each one) when using method = "nearest"
. Setting verbose = TRUE
will show which level is currently being matched. You can also use the new rbind()
method to combine the matched datasets together (in older versions, you will create statistical errors by using rbind()
).