rdplyrtidyversetidyrforcats

Collapse groups according to a list of codes


I have a table of comma separated codes that should be collapsed

group_codes <-  
 tibble::tribble(
    ~group_codes,
  "AAA11, AAA1Z",
  "ZZZ21, ZZZ2Z"
  )

And a table on which the collapsing and summing should be run:

tibble::tribble(
   ~codes, ~values,
  "AAA11",     10,
  "AAA1Z",     20,
   "CCC3",     34,
  "ZZZ21",     10,
  "ZZZ2Z",     30
  )

The third row will stay intact, it is not in the list of codes to be collapsed.

The expected result is the collapse of codes in a list and sum their values.

tibble::tribble(
          ~codes, ~values,
   "AAA11,AAA1Z",     30L,
          "CCC3",     34L,
  "ZZZ21, ZZZ2Z",     40L
  )

I am expecting to run the collapse after a group-by. The codes will not repeat. They may in most cases follow each other but the order is unknown (data comes from a manual entry form. What would be best tidyverse approach?


Solution

  • We may split the column in group_codes, do a join and summarise

    library(dplyr)
    library(tidyr)
    group_codes %>%
       mutate(codes = group_codes) %>%
       separate_rows(codes) %>%
       left_join(df1, ., by = "codes") %>%
       transmute(codes = coalesce(group_codes, codes), values) %>% 
       group_by(codes) %>% 
       summarise(values = sum(values, na.rm = TRUE), .groups = 'drop')
    

    -output

    # A tibble: 3 × 2
      codes        values
      <chr>         <dbl>
    1 AAA11, AAA1Z     30
    2 CCC3             34
    3 ZZZ21, ZZZ2Z     40