Lets assume, my data is like
group_id col1
1 1 A,B
2 1 B,C
3 2 A,C
4 2 B,D
5 3 A,D
6 3 A,B,C,D
I would like to summarise/mutate col1 where its elements are intersected within same group (over group_id). My required output is like (if summarising)
group_id col1
1 1 B
2 2 <NA>
3 3 A,D
or like this (if mutating)
group_id col1
1 1 B
2 1 B
3 2 <NA>
4 2 <NA>
5 3 A,D
6 3 A,D
I can easily create a union by using function toString
but scratching my head for how to have common elements in the output. Basically intersect
requires at least two arguments and is therefore not working here.
dput(df) is as under
df <- structure(list(group_id = c(1L, 1L, 2L, 2L, 3L, 3L), col1 = c("A,B",
"B,C", "A,C", "B,D", "A,D", "A,B,C,D")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
You can split col1
on comma and use Reduce
+ intersect
to get common values in each group_id
.
library(dplyr)
df %>%
group_by(group_id) %>%
summarise(col1 = toString(Reduce(intersect, strsplit(col1, ','))))
# group_id col1
#* <int> <chr>
#1 1 "B"
#2 2 ""
#3 3 "A, D"