rdplyrtidyr

Quick way to combine several pairs of columns into one column for each pair


In R I am trying to combine pairs of columns in order to find the maximum value between the pairs of columns.

My dataframe is structured like this:

df <- data.frame(id = c("u1", "u1", "u2", "u2"),
                 visit = c("v1", "v2", "v1", "v2"),
                 a1 = c(1, 3, 5, 7),
                 b1 = c(11, 31, 51, 71),
                 a2 = c(2, 4, 6, 8),
                 b2 = c(21, 41, 61, 81))

Where each id has had two visits where they performed two sets of the same test and two sets of a different test. My thought was that i could combine the columns like so (order doesn't matter):

data.frame(id = c("u1", "u1", "u2", "u2", "u1", "u1", "u2", "u2"),
                 visit = c("v1", "v2", "v1", "v2", "v1", "v2", "v1", "v2"),
                 a = c(1, 3, 5, 7, 2, 4, 6, 8),
                 b = c(11, 31, 51, 71, 21, 41, 61, 81))

And from there I could find the max values for each id for each visit using group_by and make a pairwise t-test based on max vaulues.

Here is the code i tried to use to combine pairs of columns into columns.

df_long <- df %>%
  pivot_longer(cols = starts_with("a"), 
               names_to = c("variable"), 
               values_to = "a") %>%
  pivot_longer(cols = starts_with("b"), 
               names_to = "variable2", 
               values_to = "b") %>%
  select(id, visit, a, b)

The pivot_longer works fine for combining two columns (i.e. for example only doing the columns with a), but every value is shown twice when i try the above and thrice when if I add another pivot_longer. I know i can do this one test at a time but that would be pretty time comsuming seeing as i have a large number of tests, and i figure there has to be a better way of doing this.


Solution

  • Here's an option using the tidyverse approach:

    library(tidyverse)
    df |>
      pivot_longer(cols = -c(id, visit),
                   names_pattern = "(.)(\\d+)",
                   names_to = c(".value", "number")) |> 
      arrange(number)
    
    # A tibble: 8 × 5
      id    visit number     a     b
      <chr> <chr> <chr>  <dbl> <dbl>
    1 u1    v1    1          1    11
    2 u1    v2    1          3    31
    3 u2    v1    1          5    51
    4 u2    v2    1          7    71
    5 u1    v1    2          2    21
    6 u1    v2    2          4    41
    7 u2    v1    2          6    61
    8 u2    v2    2          8    81
    

    The idea here is to use pivot_longers names_pattern argument along with the .value placeholder in names_to to create a regex that defines the sets of variables that you want to put together.

    In this particular case we use the regex to split the column names a1, a2, b1 and b2 into their letter (.) and the number (\\d+) and tell the names_to argument that the letter part should remain as columns ".value" whereas the digit at the end should be the part that will be reshaped into long format (I kept that information in the column number).