Lets say I have:
group X Y Z
A cat, dog dog, fox
A fox, chicken dog, fox, chicken
A
B fox, dog
B fox
B bunny
I want to summarize unique animals per group based on these three columns, as
group animal
A cat, dog, fox, chicken
B bunny, dog, fox
The order of animals doens't matter, as long as they are the unique values. I tried:
df %>%
group_by(group) %>%
separate_rows("X", sep=",") %>%
distinct %>%
summarise(X = toString(X))
Which does not work even for one column (I get a bunch of NAs)
I was thinking something such as
df %>% group_by(group) %>% summarise_at(vars("X","Y","Z"), sum)
Which works for numerical variables not separated by comma (one value per row)
One way would be:
df %>%
pivot_longer(-group)%>%
separate_rows(value)%>%
summarise(animal = toString(unique(value[nzchar(value)])), .by = group)
# A tibble: 2 × 2
group animal
<chr> <chr>
1 A cat, dog, fox, chicken
2 B fox, dog, bunny
In base R you could do:
fn <- function(x) {
toString(unique(scan(text=x, what="", sep=',', strip.white = TRUE, quiet = TRUE)))
}
aggregate(values~group, cbind(df[1], stack(df,-1)), fn)
group values
1 A cat, dog, fox, chicken
2 B fox, dog, bunny