A grouped/longitudinal data frame takes the form
id <- c(1,1,2,2,2,3,3,3)
x <- c(0,0,1,1,1,1,1,1)
t1 <- c(1,0,1,1,0,0,1,0)
t2 <- c(2.1,1.5,1.7,2.0,2.0,2.8,2.1,2.1)
t3 <- c(21,18,23,26,27,25,31,22)
df <- data.frame(id, x, t1, t2, t3)
I assume column x is time-invariant, but t1, t2, and t3 are time-varying. For t1 and t2, I want to replace their values by their first row values per each id and repeating it by the number of occurrence of the id, while keeping the values of x and t3 unchanged. The expected is
id x t1 t2 t3
1 1 0 1 2.1 21
2 1 0 1 2.1 18
3 2 1 1 1.7 23
4 2 1 1 1.7 26
5 2 1 1 1.7 27
6 3 1 0 2.8 25
7 3 1 0 2.8 31
8 3 1 0 2.8 22
To resolve this I filter the data by first row per each id using the dplyr package. Then, generate a column of the number of times each id is repeated (i.e., ntimes) and finally repeat each row by ntimes. My code below
df <- df %>%
group_by(id) %>%
filter(row_number() == 1) %>%
ungroup()
df$ntimes <- c(2,3,3)
df <- as.data.frame(lapply(df, rep, df$ntimes))
df <- subset(df, select = -c(ntimes))
This results in altering the t3 values. Also, I find the use of ntimes inconvenient as I am dealing with a large data frame with too many rows and columns. Is there a handy way of going about this to get the expected outout? Help is greatly appreciated!
Using dplyr::across
and dplyr::first
you can do:
library(dplyr, warn = FALSE)
df |>
mutate(across(c(t1, t2), first), .by = id)
#> id x t1 t2 t3
#> 1 1 0 1 2.1 21
#> 2 1 0 1 2.1 18
#> 3 2 1 1 1.7 23
#> 4 2 1 1 1.7 26
#> 5 2 1 1 1.7 27
#> 6 3 1 0 2.8 25
#> 7 3 1 0 2.8 31
#> 8 3 1 0 2.8 22