rdplyrtidyrreshape2

Reshaping data into long forma


I have two large data frames, the sample below shows what they look like:

Name<-c("ST01","ST02","ST03")
B1<-c(0.01,0.003,0.04)
B2<-c(0.02,0.002,0.02)
B3<-c(0.04,0.008,0.06)
B4<-c(0.08,0.009,0.08)
B5<-c(0.09,0.004,0.05)

df1<-data.frame(Name,B1,B2,B3,B4,B5)
 
 Name    A1    A2    A3    A4   A5
1 ST01 0.020 0.040 0.050 0.060 0.08
2 ST02 0.036 0.034 0.054 0.034 0.24
3 ST03 0.140 0.320 0.560 0.780 0.59

Name<-c("ST01","ST02","ST03")
A1<-c(0.02,0.036,0.14)
A2<-c(0.04,0.034,0.32)
A3<-c(0.05,0.054,0.56)
A4<-c(0.06,0.034,0.78)
A5<-c(0.08,0.24,0.59)

df2<-data.frame(Name,A1,A2,A3,A4,A5)

 Name    B1    B2    B3    B4    B5
1 ST01 0.010 0.020 0.040 0.080 0.090
2 ST02 0.003 0.002 0.008 0.009 0.004
3 ST03 0.040 0.020 0.060 0.080 0.050

I would like to reshape them so that I can use ggplot to plot all the As against Bs and then plot As against Bs for each value, my desired output would be something like this:

  Name Value     A     B
1  ST01     1 0.020 0.010
2  ST01     2 0.040 0.020
3  ST01     3 0.050 0.040
4  ST01     4 0.060 0.080
5  ST01     5 0.080 0.090
6  STO2     1 0.036 0.003
7  STO2     2 0.034 0.002
8  STO2     3 0.054 0.008
9  STO2     4 0.034 0.009
10 STO2     5 0.240 0.004
11 ST03     1 0.140 0.040
12 ST03     2 0.320 0.020
13 ST03     3 0.560 0.060
14 ST03     4 0.780 0.080
15 ST03     5 0.590 0.050
> 

I am new to R so not quite if this can be done, or if there is a more appropriate way to shape the data in order to plot. Any help would be appreciated.


Solution

  • You could first merge your datasets using a left_join then reshape to long by utilizing the names_pattern argument of pivot_longer and the special ".value". For the names_pattern argument I use the regex "(.)(.)" to split the column names into two one character strings, e.g. "A1" is split in to "A" and "1" where the first part ("A") gives the name of the value column and the second part ("1") is stored as category in the name column ("Value"):

    library(dplyr, warn = FALSE)
    library(tidyr)
    
    left_join(df2, df1, by = "Name") |>
      pivot_longer(-Name, names_to = c(".value", "Value"), names_pattern = "(.)(.)")
    #> # A tibble: 15 × 4
    #>    Name  Value     A     B
    #>    <chr> <chr> <dbl> <dbl>
    #>  1 ST01  1     0.02  0.01 
    #>  2 ST01  2     0.04  0.02 
    #>  3 ST01  3     0.05  0.04 
    #>  4 ST01  4     0.06  0.08 
    #>  5 ST01  5     0.08  0.09 
    #>  6 ST02  1     0.036 0.003
    #>  7 ST02  2     0.034 0.002
    #>  8 ST02  3     0.054 0.008
    #>  9 ST02  4     0.034 0.009
    #> 10 ST02  5     0.24  0.004
    #> 11 ST03  1     0.14  0.04 
    #> 12 ST03  2     0.32  0.02 
    #> 13 ST03  3     0.56  0.06 
    #> 14 ST03  4     0.78  0.08 
    #> 15 ST03  5     0.59  0.05