I need to convert the event-level dataset to the patient-level data i.e. convert a long dataset to a wider one based on deidnum
as a key variable. Additionally, I want to create columns for each produced event and its event time. In case the same patient has more than one event, consider the earliest event time.
Here is a similar data sample and my code:
df <- read.table(text = "deidnum,eventc,EVENTDT,MI_COMPLICATED
325107,MI,21,1
325107,New Rose Dyspnea Scale 2 or more,1468,NA
418351,New Rose Dyspnea Scale 2 or more,207,NA
839172,New Rose Dyspnea Scale 2 or more,1060,NA
839172,New Rose Dyspnea Scale 2 or more,1718,NA
1487422,MI,990,0
1487422,DEATH,1113,NA
1511165,MI,424,0
1511165,MI,608,1
1511165,New Rose Dyspnea Scale 2 or more,721,NA
", sep = ",", header = TRUE)
library(reshape2)
wide.df <- dcast(df, deidnum ~ eventc)
wide.df
The current output
deidnum DEATH MI New Rose Dyspnea Scale 2 or more
1 325107 0 1 1
2 418351 0 0 1
3 839172 0 0 2
4 1487422 1 1 0
5 1511165 0 2 1
Any advice will be greatly appreciated.
A tidyverse
workflow:
library(tidyr)
library(dplyr)
df %>%
slice_min(EVENTDT, by = c(deidnum, eventc)) %>%
pivot_wider(id_cols = deidnum, names_from = eventc,
values_from = c(eventc, EVENTDT),
values_fn = list(eventc = length),
values_fill = list(eventc = 0),
unused_fn = first) %>%
rename_with(~ sub("eventc_", "", .x), starts_with("eventc"))
# # A tibble: 5 × 8
# deidnum MI `New Rose Dyspnea Scale 2 or more` DEATH EVENTDT_MI `EVENTDT_New Rose Dyspnea Scale 2 or more` EVENTDT_DEATH MI_COMPLICATED
# <int> <int> <int> <int> <int> <int> <int> <int>
# 1 325107 1 1 0 21 1468 NA 1
# 2 418351 0 1 0 NA 207 NA NA
# 3 839172 0 1 0 NA 1060 NA NA
# 4 1487422 1 0 1 990 NA 1113 0
# 5 1511165 1 1 0 424 721 NA 0
Note: unused_fn = first
is for grouping by the id_cols
columns (deidnum
) then summarizing the unused columns (MI_COMPLICATED
) using first()
(assuming it has been ordered by EVENTDT
).