rdataframedplyrgroup-summaries

Combine rows in data frame without grouping


tibble::tibble(
  col1 = c("A","","C","",""),
  col2 = c("string1 part 1","string1 part 2",
           "string2 part 1", "string2 part 2",
           "string3"),
  col3 = c(1, "", 2, "", 3)
)

I'd like to merge the rows in col2, obtaining a tibble without empty strings like this

tibble::tibble(
  col1 = c("A","C",NA),
  col2 = c("string1 part 1 string1 part 2", "string2 part 1 string2 part 2", "string3"),
  col3 = c(1,2,3)
)

# A tibble: 3 x 3
  col1  col2                           col3
  <chr> <chr>                         <dbl>
1 A     string1 part 1 string1 part 2     1
2 C     string2 part 1 string2 part 2     2
3 NA    string3                           3

Grouping, as I found in other answers, does not seems to be an option while I don't have any reference column


Solution

  • This works with dplyr

    tibble::tibble(
      col1 = c("A","","C","",""),
      col2 = c("string1 part 1","string1 part 2",
               "string2 part 1", "string2 part 2",
               "string3"),
      col3 = c(1, "", 2, "", 3)
    ) %>% 
    # fill empty values on col3
    mutate(col3 = case_when(
        col3!="" ~ 1,
        T ~ 0
    )) %>%
    mutate(col3 = cumsum(col3)) %>%
    # fill empty values on col1
    group_by(col3) %>%
    mutate(col1 = first(col1)) %>%
    # group & summarise
    group_by(col1, col3) %>%
    summarise(col2 = paste(col2, collapse=' ')) %>%
    # replace empty string by NA & arrange by col3
    ungroup() %>%
    mutate(col1 = case_when(
        col1=="" ~ as.character(NA),
        T ~ col1
    )) %>%
    arrange(col3) %>%
    select(col1, col2, col3)
    

    Output :

    # A tibble: 3 x 3
    #  col1  col2                           col3
    #  <chr> <chr>                         <dbl>
    #1 A     string1 part 1 string1 part 2     1
    #2 C     string2 part 1 string2 part 2     2
    #3 <NA>  string3                           3