rdplyrpipe

Dynamically concatenate columns in data frame in pipes


I have a process where I am merging dataframes together, and then generating an ID variable by concatenating some of the columns together. However, the names of the columns that need to be concatenated are in a character vector.

I can do this in separate steps like this:

new <- merge(old1, old2, by = "MergeVar")

id <- do.call(paste0, new[idcols])

new$id <- id

However, I'd like to do this using pipes. If the columns were fixed, then I know I could do something like this:

new <- merge(old1, old2, by = "MergeVar") %>%
  mutate(id = paste0(colA, colB))

Is there a nice way to merge these two approaches together so that I can keep the pipe structure but reference the column names dynamically?


Solution

  • Assume the test data shown in the Note at the end. BOD comes with R.

    1) With base R we use merge and then create a list with a single x component holding the merge result and then use that in the last leg of the pipe.

    old1 |>
      merge(old2, by = by_var) |>
      list(x = _) |>
      with(transform(x, id = do.call("paste0", x[idcols])))
    ##   Time demand.x demand.y       id
    ## 1    1      8.3      8.3   8.38.3
    ## 2    2     10.3     10.3 10.310.3
    ## 3    3     19.0     19.0     1919
    

    2) With dplyr we can use left_join and use !! and !!! to insert the variables.

    library(dplyr)
    
    old1 %>%
      left_join(old2, join_by(!!sym(by_var))) %>%
      mutate(id = paste0(!!!syms(idcols)))
    ##   Time demand.x demand.y       id
    ## 1    1      8.3      8.3   8.38.3
    ## 2    2     10.3     10.3 10.310.3
    ## 3    3     19.0     19.0     1919
    

    3) With magrittr but no other packages we can rewrite (1) as:

    library(magrittr)
    
    old1 %>%
      merge(old2, by = by_var) %>%
      transform(id = do.call("paste0", .[idcols]))
    ##   Time demand.x demand.y       id
    ## 1    1      8.3      8.3   8.38.3
    ## 2    2     10.3     10.3 10.310.3
    ## 3    3     19.0     19.0     1919
    

    Note

    old1 <- old2 <- head(BOD, 3)
    by_var <- "Time"
    idcols <- c("demand.x", "demand.y")