[SOLVED] Get unique values from rows with comma separated values based on a specific category column in R

Get unique values from rows with comma separated values based on a specific category column in R

Lets say I have:

group     X                 Y                   Z
A         cat, dog          dog, fox        
A         fox, chicken      dog, fox, chicken
A
B         fox, dog 
B         fox
B                                               bunny

I want to summarize unique animals per group based on these three columns, as

group     animal
A         cat, dog, fox, chicken
B         bunny, dog, fox

The order of animals doens't matter, as long as they are the unique values. I tried:

df %>%  
group_by(group) %>% 
separate_rows("X", sep=",")   %>%  
distinct %>%    
summarise(X = toString(X))

Which does not work even for one column (I get a bunch of NAs)

I was thinking something such as

df %>%   group_by(group) %>%   summarise_at(vars("X","Y","Z"), sum)

Which works for numerical variables not separated by comma (one value per row)

Solution

One way would be:

df %>%
   pivot_longer(-group)%>%
   separate_rows(value)%>%
   summarise(animal = toString(unique(value[nzchar(value)])), .by = group)

# A tibble: 2 × 2
  group animal                
  <chr> <chr>                 
1 A     cat, dog, fox, chicken
2 B     fox, dog, bunny

In base R you could do:

fn <- function(x) {
  toString(unique(scan(text=x, what="", sep=',', strip.white = TRUE, quiet = TRUE)))
}
aggregate(values~group, cbind(df[1], stack(df,-1)), fn)

  group                 values
1     A cat, dog, fox, chicken
2     B        fox, dog, bunny