rcombinationscombn

Using combn to make specific functions for grouped pair-wise, row-wise comparisons


This is a small section of a dataset I'm working on.

dat2 <- read.table(text = "
   nodepair  V1  V2  V3  V4  V5  V6  V7  V8  V9 ES   
 1 A1_A1        0    21     0     0     0     0     0     0    78 45   
 2 A2_A1        0     0     0     0     0     0     0     0    99 45   
 3 A2_A2        0     1     0     0     0     0     0     0    98 45   
 4 A3_A1        0     0     0     0     0     6     1     3    89 45   
 5 A3_A2        0     0     0     0     0     0     0     0    99 45   
 6 A1_A1        0    20     0     0     0     0     0     0    65 46   
 7 A2_A1        0     0     0     0     0     0     0     0    85 46   
 8 A2_A2        0     1     0     0     0     0     0     0    84 46   
 9 A3_A1        0     0     0     0     2     6     3     3    71 46   
 10 A3_A2        0     0     0     0     0     0     0     0    85 46   
 11 A1_A1        0    25     0     0     0     0     0     0    45 47   
 12 A2_A1        0     0     0     0     0     0     0     0    70 47   
 13 A2_A2        0     1     0     0     0     0     0     0    69 47   
 14 A3_A1        0     0     0     0     0     8     0     1    61 47   
 15 A3_A2        0     0     0     0     0     0     0     0    70 47   
 16 A1_A1        0    37     0     0     0     0     0     0    77 48   
 17 A2_A1        0     0     0     0     0     0     0     0   114 48   
 18 A2_A2        0     0     0     0     0     0     0     0   114 48   
 19 A3_A1        0     0     0     0     2     9     0     3   100 48   
 20 A3_A2        0     0     0     0     0     0     0     0   114 48   
 ", header = TRUE)

I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.

I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.

I'm imagining the output to look something like this:

 dat3 <- read.table(text = "
    nodepair1 nodepair2  V1  V2  V3  V4  V5  V6  V7  V8  V9    
    A1_A1(45) A1_A1(46)   0     0    1     0     0     0     0     0     1        
  ", header = TRUE)

etc.

Unfortunately, I haven't gotten very far:

 dat2 <- dat2 %>%
   group_by(nodepair) %>%
   col2 = t(combn(nodepair,2)))

I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.


Solution

  • Now with the TO having clarified their question, I'd propose the following solution:

    library(tidyverse)
    
    ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
    
    dat2 |> 
      group_split(nodepair) |> 
      map(.x = _,
          .f = \(df) df |> 
            map(.x = 1:length(ES_combs),
                .f = ~df |> 
                   filter(ES %in% ES_combs[[.x]]) |> 
                   summarize(nodepair = first(nodepair),
                             ES_1 = ES[1],
                             ES_2 = ES[2],
                             across(V1:V9, ~as.numeric(all(. >0)))))) |> 
      bind_rows()
    

    which gives:

    # A tibble: 30 × 12
       nodepair  ES_1  ES_2    V1    V2    V3    V4    V5    V6    V7    V8    V9
       <chr>    <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 A1_A1       45    46     0     1     0     0     0     0     0     0     1
     2 A1_A1       45    47     0     1     0     0     0     0     0     0     1
     3 A1_A1       45    48     0     1     0     0     0     0     0     0     1
     4 A1_A1       46    47     0     1     0     0     0     0     0     0     1
     5 A1_A1       46    48     0     1     0     0     0     0     0     0     1
     6 A1_A1       47    48     0     1     0     0     0     0     0     0     1
     7 A2_A1       45    46     0     0     0     0     0     0     0     0     1
     8 A2_A1       45    47     0     0     0     0     0     0     0     0     1
     9 A2_A1       45    48     0     0     0     0     0     0     0     0     1
    10 A2_A1       46    47     0     0     0     0     0     0     0     0     1
    11 A2_A1       46    48     0     0     0     0     0     0     0     0     1
    12 A2_A1       47    48     0     0     0     0     0     0     0     0     1
    13 A2_A2       45    46     0     1     0     0     0     0     0     0     1
    14 A2_A2       45    47     0     1     0     0     0     0     0     0     1
    15 A2_A2       45    48     0     0     0     0     0     0     0     0     1
    16 A2_A2       46    47     0     1     0     0     0     0     0     0     1
    17 A2_A2       46    48     0     0     0     0     0     0     0     0     1
    18 A2_A2       47    48     0     0     0     0     0     0     0     0     1
    19 A3_A1       45    46     0     0     0     0     0     1     1     1     1
    20 A3_A1       45    47     0     0     0     0     0     1     0     1     1
    21 A3_A1       45    48     0     0     0     0     0     1     0     1     1
    22 A3_A1       46    47     0     0     0     0     0     1     0     1     1
    23 A3_A1       46    48     0     0     0     0     1     1     0     1     1
    24 A3_A1       47    48     0     0     0     0     0     1     0     1     1
    25 A3_A2       45    46     0     0     0     0     0     0     0     0     1
    26 A3_A2       45    47     0     0     0     0     0     0     0     0     1
    27 A3_A2       45    48     0     0     0     0     0     0     0     0     1
    28 A3_A2       46    47     0     0     0     0     0     0     0     0     1
    29 A3_A2       46    48     0     0     0     0     0     0     0     0     1
    30 A3_A2       47    48     0     0     0     0     0     0     0     0     1
    

    This probably needs a bit of explanation: