I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.
id<-rep(c(1:20),times=1)
df1<-data.frame(id)
df1$yr1 <- sample(2022:2025, length(df1$id), replace=TRUE)
df1$yr1_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr2 <- df1$yr1+1
df1$yr2_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr3 <- df1$yr1+2
df1$yr3_time <- rnorm(n = 20, mean = 0, sd = 0.6)
print(df1)
# id yr1 yr1_time yr2 yr2_time yr3 yr3_time
# 1 1 2023 -0.18649844 2024 1.41458053 2025 -1.12031610
# 2 2 2025 -0.01977439 2026 0.68985414 2027 -0.69038076
# 3 3 2023 -0.08855173 2024 0.76039453 2025 -0.36913641
# 4 4 2023 0.28576478 2024 -0.35622031 2025 0.89810598
# 5 5 2024 -0.42831014 2025 -1.28914071 2026 0.44912268
# 6 6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
# 7 7 2024 0.16888122 2025 -0.10572896 2026 -0.43966363
# 8 8 2025 0.80350550 2026 0.41403554 2027 -1.41913317
# 9 9 2023 0.59990953 2024 -0.42688373 2025 -0.73899889
How to shape the wide format to the long format? Here is my expected output:
id yr yr_time
1 2023 -0.18649844
1 2024 1.41458053
1 2025 -1.12031610
2 2025 -0.01977439
2 2026 0.68985414
2 2027 -0.69038076
Thanks!
You can do this by using pivot_longer()
from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.
library(dplyr)
library(tidyr)
df1 |>
# Convert data to long format
pivot_longer(cols = starts_with("yr")) |>
mutate(
# If a row represents a year, assign its `value` to the `year` column
year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
# Move all the values up one row (this moves the `yr_time` for each year
# into the same row as the corresponding year)
yr_time = lead(value)
) |>
# Remove the rows that don't contain a year value, since those rows aren't
# needed (and now contain the wrong `yr_time` values)
filter(!is.na(year)) |>
# Arrange columns in the order you asked for
select(id, year, yr_time)
If I haven't explained each step well, try running each step separately and you should be able to see how it works.