rdplyr

Return values from multiple columns to a new column where the last character in column names == last character of values in reference column


I'm needing a more flexible method to extract values from multiple columns into column "y" where the last character in "x" == the last character of the column name. I can achieve this using a nested ifelse() but the total number of columns is not known.

There are multiple column names ending with the same integer, so I anticipate that a string prefix is needed. In the example below, I'm interested in the "a" columns.

At this stage I would prefer a string match approach where the original column order is preserved, but a column index approach that reorders the "a" columns would also be of interest. I searched SO but could not find a solution that I could adapt.

library(dplyr)
library(stringr)

# Sample data
df <- data.frame(a2 = c(rep(letters[1:2], 2), "a"),
                 a3 = c(rep(letters[2:1], 2), "a"),
                 a1 = c(rep(letters[1:2], each = 2), "a"),
                 b1 = 1:5,
                 b2 = 6:10,
                 x = paste0("x", c(1,2,2,3,4)))

# Inflexible approach
df %>% mutate(y = ifelse(str_sub(x, start = -1) == 1, a1, 
                         ifelse(str_sub(x, start = -1) == 2, a2,
                                ifelse(str_sub(x, start = -1) == 3, a3, NA))))

# Result
  a2 a3 a1 b1 b2  x    y
1  a  b  a  1  6 x1    a
2  b  a  a  2  7 x2    b
3  a  b  b  3  8 x2    a
4  b  a  b  4  9 x3    a
5  a  a  a  5 10 x4 <NA>

Solution

  • This is similar/equivalent to Yifu Han's answer on the linked question:

    df |> mutate(y = imap(x, ~df[.y, str_replace(.x, "x", "a")] %>% ifelse(is.null(.), NA, .)))