I have a dataset in wide format with no time variable and I would like to create a time variable and turn it into long format for longitudinal analysis. The original dataset looks like this:
id <- c(1,2,3)
pdq1 <- c(3,5,6)
pdq2 <- c(1,3,4)
pdq3 <- c(4,5,4)
scor_abp1 <- c(1,2,3)
scor_abp2 <- c(2,2,4)
scor_abp3 <- c(1,4,5)
dat <- data.frame(id,pdq1,pdq2,pdq3,scor_abp1,scor_abp2,scor_abp3)
Desired output:
id <- c(1,1,1,2,2,2,3,3,3)
time <- c(1,2,3,1,2,3,1,2,3)
pdq <- c(3,1,4,5,3,5,6,4,4)
abp <- c(1,2,1,2,2,4,3,4,5)
dat <- data.frame(id,time,pdq,abp)
pdq and scor_abp are two different measures. The variables with suffix 1 means it is measured at Time 1, with suffix 2 means it is measured at Time 2, etc.
I would appreciate all the help there is!!! Thanks in advance!
When you have multiple columns to pivot, you can use .value
in pivot_longer
. In names_to
, ".value"
relates to the multiple columns you want to pivot, which are followed by a digit (the time column). This all needs to be set up in the names_to
and names_pattern
arguments.
tidyr::pivot_longer(dat,
-id,
names_to = c(".value", "time"),
names_pattern = "(pdq|scor_abp)(\\d)")
# A tibble: 9 × 4
id time pdq scor_abp
<dbl> <chr> <dbl> <dbl>
1 1 1 3 1
2 1 2 1 2
3 1 3 4 1
4 2 1 5 2
5 2 2 3 2
6 2 3 5 4
7 3 1 6 3
8 3 2 4 4
9 3 3 4 5