rdplyr

Aggregate by group across logical values


I have this data frame:

dataf <- tibble(A = sample(c(TRUE, FALSE), 10, replace = T), 
+                B = sample(c(TRUE, FALSE), 10, replace = T), 
+                C = sample(c(TRUE, FALSE), 10, replace = T), 
+                group = c(rep("grp1", 3), rep("grp2", 3), rep("grp3", 4)))

> dataf
# A tibble: 10 × 4
   A     B     C     group
   <lgl> <lgl> <lgl> <chr>
 1 TRUE  TRUE  TRUE  grp1 
 2 FALSE TRUE  TRUE  grp1 
 3 TRUE  TRUE  TRUE  grp1 
 4 TRUE  TRUE  TRUE  grp2 
 5 FALSE TRUE  TRUE  grp2 
 6 TRUE  FALSE TRUE  grp2 
 7 TRUE  FALSE FALSE grp3 
 8 TRUE  FALSE TRUE  grp3 
 9 FALSE FALSE TRUE  grp3 
10 FALSE FALSE  FALSE grp3 

I want to aggregate the rows by the variable group. If in an column there exist a TRUE, a TRUE will be there, otherwise FALSE. E.g. in grp1 column A has TRUE, FALSE and TRUE. Since it has a TRUE, the aggregate should be TRUE for grp1 column A. Similarly, grp3, column B should FALSE as it doesn't have TRUE in it.

The resulting data frame should look like this:

 A     B     C     groupp
  <lgl> <lgl> <lgl> <chr> 
1 TRUE  TRUE  TRUE  grp1  
2 TRUE  TRUE  TRUE grp2  
3 TRUE  FALSE  TRUE  grp3

Any idea how to achieve this?


Solution

  • 1) dplyr Using the input in the Note at the end use across with any. At the end move the group column to be the last column.

    library(dplyr)
    
    dataf %>%
      summarize(across(where(is.logical), any), .by = group) %>%
      relocate(group, .after = last_col())
    

    giving

         A     B    C group
    1 TRUE  TRUE TRUE  grp1
    2 TRUE  TRUE TRUE  grp2
    3 TRUE FALSE TRUE  grp3
    

    2) Base R or with only base R:

    aggregate(. ~ group, dataf, any)[c(2:4, 1)]
    

    giving

         A     B    C group
    1 TRUE  TRUE TRUE  grp1
    2 TRUE  TRUE TRUE  grp2
    3 TRUE FALSE TRUE  grp3
    

    Note

    dataf as produced by the code in the question is not reproducible as it uses random numbers without set.seed(...) so we have used the following.

    Lines <- " A     B     C     group
     1 TRUE  TRUE  TRUE  grp1 
     2 FALSE TRUE  TRUE  grp1 
     3 TRUE  TRUE  TRUE  grp1 
     4 TRUE  TRUE  TRUE  grp2 
     5 FALSE TRUE  TRUE  grp2 
     6 TRUE  FALSE TRUE  grp2 
     7 TRUE  FALSE FALSE grp3 
     8 TRUE  FALSE TRUE  grp3 
     9 FALSE FALSE TRUE  grp3 
    10 FALSE FALSE  FALSE grp3 "
    dataf <- read.table(text = Lines)