rdataframefunctiondplyrtidyverse

Elongate data but also keep one of the original variable in R


I can long-format my DATA using: tidyr::pivot_longer(DATA, m:y, values_to= "z", names_to= "variable") which produces the elongated variable in my Desired_output.

But I wonder how I can get variable m to show the way shown in my Desired_output below such that for each unique id, variable m is once shown for variable=="y", and once again repeated for variable=="m" in tidyverse?

# A solution using reshape2 package is:

stacked <- reshape2::melt(DATA, id.vars = c("id", "x", "m"),
   measure.vars = c("y", "m"), value.name = "z")
Desired_output <- within(stacked, {
  sy <- as.integer(variable == "y")
  sm <- as.integer(variable == "m")
})
DATA <- read.csv("https://stats.idre.ucla.edu/stat/data/ml_sim.csv")
##    id       x       m      y
## 1   1  1.5451  0.1068  0.568
## 2   1  2.2753  2.1104  1.206
. . .

Desired_output:
##     fid id       x       m variable       z sm sy
## 1     1  1  1.5451  0.1068        y  0.5678  0  1
## 2     2  1  2.2753  2.1104        y  1.2061  0  1
## 3     3  1  0.7867  0.0389        y -0.2613  0  1
## 4     4  1 -0.0619  0.4794        y -0.7590  0  1
## 5     5  1  0.1173  0.5908        y  0.5191  0  1
## 6     6  1  1.4809  0.8909        y -0.6311  0  1
## 7     7  1  0.8928 -0.2277        y  0.1521  0  1
## 8     8  1  0.9246  0.7292        y  0.2346  0  1
## 801   1  1  1.5451  0.1068        m  0.1068  1  0
## 802   2  1  2.2753  2.1104        m  2.1104  1  0
## 803   3  1  0.7867  0.0389        m  0.0389  1  0
## 804   4  1 -0.0619  0.4794        m  0.4794  1  0
## 805   5  1  0.1173  0.5908        m  0.5908  1  0
## 806   6  1  1.4809  0.8909        m  0.8909  1  0
## 807   7  1  0.8928 -0.2277        m -0.2277  1  0
## 808   8  1  0.9246  0.7292        m  0.7292  1  0
. . .

Solution

  • DATA %>%
      mutate(m1=m, fid=row_number(), .by = id) %>%
      pivot_longer(y:m1, values_to = 'z', names_to = 'variable',
                   names_transform = ~str_remove(., '1')) %>%
      mutate(sm= +(variable == 'm'), sy=+(variable == 'y')) %>% 
      arrange(id, desc(variable))
    
    # A tibble: 1,600 × 8
          id       x       m   fid variable      z    sm    sy
       <int>   <dbl>   <dbl> <int> <chr>     <dbl> <int> <int>
     1     1  1.55    0.107      1 y         0.568     0     1
     2     1  2.28    2.11       2 y         1.21      0     1
     3     1  0.787   0.0389     3 y        -0.261     0     1
     4     1 -0.0619  0.479      4 y        -0.759     0     1
     5     1  0.117   0.591      5 y         0.519     0     1
     6     1  1.48    0.891      6 y        -0.631     0     1
     7     1  0.893  -0.228      7 y         0.152     0     1
     8     1  0.925   0.729      8 y         0.235     0     1
     9     1  1.55    0.107      1 m         0.107     1     0
    10     1  2.28    2.11       2 m         2.11      1     0