This question is about how to generate frequency transition tables from longitudinal data in the long format using R base functions or commonly used packages such as dplyr. Consider the longitudinal data
id <- c(1,1,2,2,3,3,4,4)
state <- c("C","A", "A", "A", "B", "A", "C", "A")
period <- rep(c("Start", "End"), 4)
df <- data.frame(id, state, period)
df
id state period
1 1 C Start
2 1 A End
3 2 A Start
4 2 A End
5 3 B Start
6 3 A End
7 4 C Start
8 4 A End
and the expected output
transition freq
1 A to A 1
2 A to B 0
3 A to C 0
4 B to B 0
5 B to A 1
6 B to C 0
7 C to C 0
8 C to A 2
9 C to B 0
I can generate the above output using the function statetable.msm in the msm package. However, I would like to know if this could be generated by base functions in R or other packages such as dplyr. Help is much appreciated!
With tidyverse
, using left_join
on the crossing
states
library(dplyr)
library(tidyr)
left_join(crossing(Start = df$state, End = df$state),
pivot_wider(df, names_from = period, values_from = state)) %>%
reframe(freq = id * 0 + n(), .by = c(Start, End)) %>%
distinct()
output
# A tibble: 9 × 3
Start End freq
<chr> <chr> <dbl>
1 A A 1
2 A B NA
3 A C NA
4 B A 1
5 B B NA
6 B C NA
7 C A 2
8 C B NA
9 C C NA
or, to get the complete desired format
left_join(crossing(Start = df$state, End = df$state),
pivot_wider(df, names_from = period, values_from = state)) %>%
mutate(freq = id * 0 + n(),
freq = replace(freq, is.na(freq), 0), .by = c(Start, End)) %>%
mutate(transition = purrr::map2_vec(Start, End, ~ paste(.x, "to", .y)), .before=1) %>%
select(-c(Start, End, id)) %>%
distinct()
Joining with `by = join_by(Start, End)`
# A tibble: 9 × 2
transition freq
<chr> <dbl>
1 A to A 1
2 A to B 0
3 A to C 0
4 B to A 1
5 B to B 0
6 B to C 0
7 C to A 2
8 C to B 0
9 C to C 0