rdplyrdata-wranglinggrepl

Rearranging a R dataframe (changing to wide format based on certain conditions, renaming and reshuffling columns)


I have a dataframe that looks something like this:

example <- data.frame(
  date = c("6/1/22", "6/2/22", "6/3/22",
          "6/1/22", "6/2/22", "6/3/22",
          "6/1/22", "6/2/22", "6/3/22",
          "6/1/22", "6/2/22", "6/3/22"),
  sub = c(1101, 1101, 1101, 
          1102, 1102, 1102,
          2101, 2101, 2101,
          2102, 2102, 2102),
  express_p = c("eg1", "eg2", "eg3", "eg4",
                "eg5", "eg6", "eg7", "eg8",
                "eg9", "eg10", "eg11", "eg12"),
  p_express = c("a", "b", "c", "d",
                "e", "f", "g", "h",
                "i", "j", "k", "l")
)

I want to make it into a wider format and also reshuffle the column order. This is how the end result should look like:

example_clean <- data.frame(
  date = c("6/1/22", "6/2/22", "6/3/22", "6/1/22", "6/2/22", "6/3/22"),
  subA = c(1101, 1101, 1101, 1102, 1102, 1102),
  subB = c(2101, 2101, 2101, 2102, 2102, 2102),
  express_p_A = c("eg1", "eg2", "eg3", "eg7", "eg8", "eg9"),
  p_express_B = c("d", "e", "f", "j", "k", "l"),
  express_p_B = c("eg4", "eg5", "eg6", "eg10", "eg11", "eg12"),
  p_express_A = c("a", "b", "c", "g", "h", "i")
)

Essentially, I am pairing up all the numbers in sub that have the same 3 last digits to be in the saw row. Then, the order of the columns should also be reshuffled (and renamed) such that express_p for one subject is right beside p_express of its corresponding partner (e.g. 1101's express_p is to the left of 2101's p_express). Edit: It is also categorized by date.

Does anyone know an elegant way to do this?

Thank you!


Solution

  • You have to define two columns before pivoting: one for the prefix you're adding to the new columns (A and B) and one for identifying the groups of rows:

    library(tidyr)
    library(dplyr)
    example %>% 
      group_by(gp = sub('.', '', sub)) %>% 
      mutate(name = LETTERS[1:n()]) %>% 
      pivot_wider(values_from = sub:p_express)
    
      gp    sub_A sub_B express_p_A express_p_B p_express_A p_express_B
      <chr> <dbl> <dbl> <chr>       <chr>       <chr>       <chr>      
    1 101    1101  2101 eg1         eg3         a           c          
    2 102    1102  2102 eg2         eg4         b           d