rvariablessplitnames

how to create a new categorical variable from a combined varibale name


I'm working on the following dataframe:

  subject group_type session_code exposure beta_early_X beta_early_Y beta_early_Z gamma_peak_X
1       X1       GRP1            A  LOW-CNT         -6.4         -3.5         10.2         -7.7
2       X1       GRP1            A  LOW-EXP         -5.9         -3.8         11.4         -5.1
3       X1       GRP1            A HIGH-EXP         -2.1          1.1         12.8         -4.3
4       X2       GRP1            A  LOW-CNT          1.2          3.9         14.1         -2.5
5       X2       GRP1            A  LOW-EXP          1.7          5.2         14.5         -5.9
6       X2       GRP1            A HIGH-EXP          5.6          8.1         15.9         -1.7
7       X3       GRP1            A  LOW-CNT          0.2         -0.9          2.9          3.9
8       X3       GRP1            A  LOW-EXP         -2.9         -2.5          0.5          2.8
9       X3       GRP1            A HIGH-EXP         -2.3         -1.8          3.0          2.3
10      X4       GRP1            A  LOW-CNT         -3.1          3.6         12.7         -1.0
   gamma_peak_Y gamma_peak_Z
1          -3.8          7.2
2          -3.0          8.6
3          -1.9          8.9
4          -1.3          4.4
5          -2.7          4.6
6           0.4          8.4
7           2.8          6.7
8           1.9          4.9
9           2.5          4.4
10          4.7         12.1

If I would like to split the notation X, Y from the variable name where they appear to create a new variable named differently wherein these notation shouldd be listed, what should I do?

Thanks in advance

Here dataset

structure(list(
  subject = c("X1", "X1", "X1", "X2", "X2", "X2", "X3", "X3", "X3", "X4"),
  group_type = c("GRP1", "GRP1", "GRP1", "GRP1", "GRP1", "GRP1", "GRP1", "GRP1", "GRP1", "GRP1"),
  session_code = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A"),
  exposure = c("LOW-CNT", "LOW-EXP", "HIGH-EXP", "LOW-CNT", "LOW-EXP", "HIGH-EXP", "LOW-CNT", "LOW-EXP", "HIGH-EXP", "LOW-CNT"),
  beta_early_X = c(-6.4, -5.9, -2.1, 1.2, 1.7, 5.6, 0.2, -2.9, -2.3, -3.1),
  beta_early_Y = c(-3.5, -3.8, 1.1, 3.9, 5.2, 8.1, -0.9, -2.5, -1.8, 3.6),
  beta_early_Z = c(10.2, 11.4, 12.8, 14.1, 14.5, 15.9, 2.9, 0.5, 3.0, 12.7),
  gamma_peak_X = c(-7.7, -5.1, -4.3, -2.5, -5.9, -1.7, 3.9, 2.8, 2.3, -1.0),
  gamma_peak_Y = c(-3.8, -3.0, -1.9, -1.3, -2.7, 0.4, 2.8, 1.9, 2.5, 4.7),
  gamma_peak_Z = c(7.2, 8.6, 8.9, 4.4, 4.6, 8.4, 6.7, 4.9, 4.4, 12.1)
), class = "data.frame", row.names = c(NA, -10L))

Solution

  • May be we need pivot_longer to reshape from 'wide' to 'long' Specify the cols with column names that have the ( followed by digits (\\d+), split at the delimiter (. by specifying the names_sep . The names_to with .value will keep the column values before the . and the new column 'electrode' keeps the suffix part of the column name after the .

    library(dplyr)
    library(tidyr)
    data %>% 
       pivot_longer(cols = matches("\\(\\d+-\\d+"), 
         names_to = c(".value", "electrode"), names_sep = "\\.")
    

    -output

    # A tibble: 24 × 9
       ID    GR    SES   COND    electrode `P3(400-450)` `LPPearly(500-700)` `LPP1(500-1000)` `LPP2(1000-1500)`
       <chr> <chr> <chr> <chr>   <chr>             <dbl>               <dbl>            <dbl>             <dbl>
     1 01    RP    V     NEG-CTR FCz             -11.6                -11.8            -5.67             -0.199
     2 01    RP    V     NEG-CTR Cz               -5.17                -5.96           -0.774             2.96 
     3 01    RP    V     NEG-CTR Pz               11.9                  8.24            9.99              6.28 
     4 01    RP    V     NEG-CTR POz              NA                   NA              NA                 7.91 
     5 01    RP    V     NEG-NOC FCz             -11.1                 -9.15           -4.39             -3.16 
     6 01    RP    V     NEG-NOC Cz               -5.53                -5.11           -0.650            -2.13 
     7 01    RP    V     NEG-NOC Pz               12.1                  9.51           11.1               5.25 
     8 01    RP    V     NEG-NOC POz              NA                   NA              NA                 9.95 
     9 01    RP    V     NEU-NOC FCz              -4.00                -7.58           -2.97              0.896
    10 01    RP    V     NEU-NOC Cz                0.622               -2.82            1.14              2.95 
    # … with 14 more rows