rdplyr

How to Nest Data for Pairwise Group Comparisons in dplyr?


I often use the nest() function in dplyr to perform models on nested tibbles. For example, testing within each group if the value changes over time:

library(tidyverse)
library(lme4)
set.seed(123)

data <- tibble(
  ID = rep(1:50, each = 2),            
  time = rep(c(0, 1), times = 50),
  group = rep(sample(c("A", "B", "C"), 50, replace = TRUE), each = 2),
  value = runif(100)
)

data %>% 
  group_by(group) %>% 
  nest() %>% 
  mutate(lmm = map(data, \(x) lme4::lmer(value ~ time + (1|ID),
                                         data = x)))

Now, I want to compare how groups differ over time by excluding one group at a time. Specifically, I want to fit the model value ~ time * group + (1 | ID) for all combinations of two groups.

Is it possible to nest the data such that each row in the nested tibble represents combinations of two groups? How can this be achieved efficiently in dplyr?


Solution

  • If I understand correctly, you could use combn to create a tibble with pairwise combinations, then map it:

    mdl_results <- tibble(groups = combn(unique(data$group), 2, simplify = FALSE)) %>%
      mutate(
        lmm = purrr::map(groups, ~ lme4::lmer(value ~ time * group + (1 | ID), 
                                 data = filter(data, group %in% .x))),
        summary = purrr::map(lmm, summary)
      )
    
    #   groups    lmm       summary   
    #   <list>    <list>    <list>    
    #   1 <chr [2]> <lmerMod> <smmry.mM>
    #   2 <chr [2]> <lmerMod> <smmry.mM>
    #   3 <chr [2]> <lmerMod> <smmry.mM>
    

    Though this is just programmatic - I agree with the comment that you may want to ask the statistical validity on Cross Validated. Good luck!