rmatchspatialstratum

Create case-control match by distance for conditional logistic regression in R


Aloha,

I am planning to run a case-control study for study sites that are evenly distributed spatially around the country. I need to select each case in the dataset and then match it to x number of controls (we will use a sensitivity analysis to select the optimal matches, so I need to be able to run it for 1,2,3,4,5,6,7,8, etc number of controls). As there is a spatial element to the data I want to run this computation within a distance matrix by selecting the controls within 25000 meters of the case.

I cannot find the optimal algorithm to run this computation in R. Is anyone aware of an optimal R package that would help me achieve this?

Thank you


Solution

  • To solve this I did the following

    Got the coordinates of the site centroid (x,y)

    Split the DB into my case-control groups

    ran a spatial buffer of the cases

    ran an intersection of the controls

    assigned a label to all intersections (match_no)

    Randomly sampled from within the match_no column

    Code below.

    db1 <- read.csv("db1_clf.csv")
    
    library(sf)
    dat <- st_as_sf(x=db1,
                       coords = c("x_coor_farm", "y_coor_farm"),
                       crs= "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
    
    
    ##Filter the positive cases
    library(dplyr)
    case = dat %>% filter(TB2017 == "1")
    control = dat %>% filter(TB2017 == "0")
    
    case_buff = st_buffer(case, dist = 25000)
    
    case_int = st_intersection(case_buff, control)
    
    library(dplyr)
    
    case_int$match_no <- as.integer(factor(case_int$idunique))
    
    library(dplyr)
    
    pos_db <- case_int %>%
      select("idunique", "match_no")
    
    pos_db$geometry= NULL
    pos_db <- unique(pos_db)
    
    neg_db <- case_int %>%
      select("idunique.1", "match_no")
    
    neg_db$geometry= NULL
    neg_db <- unique(neg_db)
    
    
    head(neg_db)
    
    
    ####Now the samples####
    library(tidyverse)
    control1 <- neg_db %>% group_by(match_no) %>% sample_n(1)
    control2 <- neg_db %>% group_by(match_no) %>% sample_n(2)
    control3 <- neg_db %>% group_by(match_no) %>% sample_n(3)
    control4 <- neg_db %>% group_by(match_no) %>% sample_n(4)
    control5 <- neg_db %>% group_by(match_no) %>% sample_n(5)
    control6 <- neg_db %>% group_by(match_no) %>% sample_n(6)
    control7 <- neg_db %>% group_by(match_no) %>% sample_n(7)
    control8 <- neg_db %>% group_by(match_no) %>% sample_n(8)
    control9 <- neg_db %>% group_by(match_no) %>% sample_n(9)
    control10<- neg_db %>% group_by(match_no) %>% sample_n(10)