I have this data frame:
dataf <- tibble(A = sample(c(TRUE, FALSE), 10, replace = T),
+ B = sample(c(TRUE, FALSE), 10, replace = T),
+ C = sample(c(TRUE, FALSE), 10, replace = T),
+ group = c(rep("grp1", 3), rep("grp2", 3), rep("grp3", 4)))
> dataf
# A tibble: 10 × 4
A B C group
<lgl> <lgl> <lgl> <chr>
1 TRUE TRUE TRUE grp1
2 FALSE TRUE TRUE grp1
3 TRUE TRUE TRUE grp1
4 TRUE TRUE TRUE grp2
5 FALSE TRUE TRUE grp2
6 TRUE FALSE TRUE grp2
7 TRUE FALSE FALSE grp3
8 TRUE FALSE TRUE grp3
9 FALSE FALSE TRUE grp3
10 FALSE FALSE FALSE grp3
I want to aggregate the rows by the variable group. If in an column there exist a TRUE
, a TRUE
will be there, otherwise FALSE
. E.g. in grp1
column A has TRUE
, FALSE
and TRUE
. Since it has a TRUE
, the aggregate should be TRUE
for grp1
column A
. Similarly, grp3
, column B should FALSE
as it doesn't have TRUE
in it.
The resulting data frame should look like this:
A B C groupp
<lgl> <lgl> <lgl> <chr>
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
Any idea how to achieve this?
1) dplyr Using the input in the Note at the end use across
with any
. At the end move the group
column to be the last column.
library(dplyr)
dataf %>%
summarize(across(where(is.logical), any), .by = group) %>%
relocate(group, .after = last_col())
giving
A B C group
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
2) Base R or with only base R:
aggregate(. ~ group, dataf, any)[c(2:4, 1)]
giving
A B C group
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
dataf as produced by the code in the question is not reproducible as it uses random numbers without set.seed(...)
so we have used the following.
Lines <- " A B C group
1 TRUE TRUE TRUE grp1
2 FALSE TRUE TRUE grp1
3 TRUE TRUE TRUE grp1
4 TRUE TRUE TRUE grp2
5 FALSE TRUE TRUE grp2
6 TRUE FALSE TRUE grp2
7 TRUE FALSE FALSE grp3
8 TRUE FALSE TRUE grp3
9 FALSE FALSE TRUE grp3
10 FALSE FALSE FALSE grp3 "
dataf <- read.table(text = Lines)