rdplyrdata.table

Ignoring NAs while aggregating character vector


I have this very simple dataset.

id <- c(12, 13, 26, 45, 55, 66)
group <- c("one", "one", "two", "two", "three", "three")
txt <- c("quick", "brown", NA, "fox", NA, NA)

df <- data.frame(id, group, txt)
df
> df
  id group   txt
1 12   one quick
2 13   one brown
3 26   two  <NA>
4 45   two   fox
5 55 three  <NA>
6 66 three  <NA>

I want to aggregate by group. I am trying this,

df |> 
  dplyr::group_by(group) |> 
  dplyr::summarise(txt = paste(txt, collapse = ", "))

# A tibble: 3 × 2
  group txt         
  <chr> <chr>       
1 one   quick, brown
2 three NA, NA      
3 two   NA, fox 

As we see the NA are outed as characters. Secondly, three comes after one in the group column. I want to have something like this,

  group          txt
1   one quick, brown
2   two          fox
3 three         <NA>

Any idea how to fix this simple problem?


Solution

  • You may either handle it yourself by code -

    library(dplyr)
    
    df |> 
      dplyr::summarise(txt = if(all(is.na(txt))) NA else toString(na.omit(txt)), .by = group)
    
       group          txt
    #1   one quick, brown
    #2   two          fox
    #3 three         <NA>
    

    Or use hablar::max_ which does the same thing.

    df |> 
      dplyr::summarise(txt = toString(hablar::max_(txt)), .by = group)