In Rstudio I've got a dataset with labresult similar to like below. It has the values Crea, Gluc and Hb and each patient has a participation start date.
Note: the real dataset is larger, with more patients and more labresults.
I've added a column with days since, and I want to add a column MomentCode
to mark the records as closest to baseline, month3 and month12, with a tolerance of one week (7 days).
After that I want to limit to only the single labresults closest (but not before) the baseline, 3 months, 12 months etc.
So in the end, the result format should be something like this:
PatientID | ParticipationDate | BASELINE_Crea | BASELINE_Gluc | BASELINE_Hb | MONTH3_Crea | MONTH3_Gluc | MONTH3_Hb | MONTH12_Crea | MONTH12_Gluc | MONTH12_Hb |
---|---|---|---|---|---|---|---|---|---|---|
PAT_001 | 05-01-2023 | 5.0 | 5.0 | etc | etc | |||||
PAT_220 | 12-03-2023 | 5.0 | 5.0 |
I'm doing everything step-by-step but maybe there's a better way? See code below
df <- read.table(text = "
PatientID,ParticipationDate,LabResultDate,LabType,LabValue,Verified
PAT_001,05-01-2023,05-01-2023,Crea,5.0,Yes
PAT_001,05-01-2023,07-01-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2023,Hb,4.2,No
PAT_220,12-03-2023,03-03-2023,Hb,5.2,Yes
PAT_220,12-03-2023,15-03-2023,Hb,5.3,Yes
PAT_220,12-03-2023,16-03-2023,Gluc,4.4,Yes
PAT_001,05-01-2023,03-04-2023,Gluc,4.6,Yes
PAT_001,05-01-2023,06-04-2023,Crea,5.4,Yes
PAT_001,05-01-2023,07-04-2023,Crea,5.0,Yes
PAT_001,05-01-2023,08-04-2023,Hb,5.1,Yes
PAT_220,12-03-2023,11-06-2023,Gluc,5.3,Yes
PAT_220,12-03-2023,12-06-2023,Crea,4.8,No
PAT_220,12-03-2023,14-06-2023,Hb,4.6,Yes
PAT_220,12-03-2023,28-06-2023,Crea,3.9,No
PAT_220,12-03-2023,23-07-2023,Hb,5.1,No
PAT_001,05-01-2023,27-07-2023,Gluc,4.3,Yes
PAT_220,12-03-2023,29-07-2023,Crea,5.1,Yes
PAT_220,12-03-2023,25-08-2023,Gluc,4.9,Yes
PAT_220,12-03-2023,27-08-2023,Crea,4.3,Yes
PAT_220,12-03-2023,14-09-2023,Crea,5.5,Yes
PAT_001,05-01-2023,17-09-2023,Hb,5.5,Yes
PAT_220,12-03-2023,09-11-2023,Hb,5.4,No
PAT_001,05-01-2023,13-11-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,17-11-2023,Hb,5.2,Yes
PAT_001,05-01-2023,29-12-2023,Crea,5.4,Yes
PAT_001,05-01-2023,31-12-2023,Crea,4.4,Yes
PAT_220,12-03-2023,03-01-2024,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2024,Gluc,5.4,Yes
PAT_001,05-01-2023,09-01-2024,Hb,4.0,Yes
PAT_001,05-01-2023,13-01-2024,Crea,4.7,Yes
PAT_001,05-01-2023,07-03-2024,Hb,4.2,Yes
PAT_220,12-03-2023,14-03-2024,Gluc,4.4,Yes
PAT_220,12-03-2023,15-03-2024,Crea,5.0,No
PAT_220,12-03-2023,17-03-2024,Hb,3.9,Yes
PAT_220,12-03-2023,23-05-2024,Crea,4.4,Yes
PAT_001,05-01-2023,23-06-2024,Gluc,4.8,No
PAT_220,12-03-2023,04-08-2024,Hb,4.3,Yes
PAT_220,12-03-2023,24-08-2024,Gluc,4.5,Yes
", header = TRUE, sep = ",", na.strings = "")
# days since participation
df$DaysSince <- as.numeric(difftime(df$ParticipationDate, df$LabResultDate, units = "days"))
# mark as moment (one week tolerance) how to??
library(dplyr)
df <- df %>%
mutate(MomentCode = case_when(
DaysSince>=0 && DaysSince<7 ~ "Baseline",
DaysSince>=90 && DaysSince<90+7 ~ "Month3",
DaysSince>=365 && DaysSince<365+7 ~ "Month12"
)
)
# patition by patient and code and moment to get only values closest to each moment code, how??
# pivot to wide format, how??
A couple of comments regarding your example data:
DaysSince
(LabResultDate
before ParticipationDate
which does not make sense)Having said that, I made the following assumptions:
cut
the rest of the code would remain the same in this case)library(dplyr)
library(tidyr)
library(lubridate)
df2 <- df %>%
as_tibble() %>% ## format as tibble for nicer printing
mutate(## format date columns as date
across(ends_with("Date"), dmy),
## calculate differences in days
DaysSince = as.numeric(difftime(LabResultDate, ParticipationDate, unit = "days")),
## split day differences into buckets with cutoffs 7 and 97 days
MomentCode = cut(DaysSince, c(-Inf, 7, 97, Inf), c("Baseline", "Month3", "Month12")),
## create new column comnbining LabType and MomentCode
Measurement = paste(LabType, MomentCode, sep = "_")
)
df2 %>%
pivot_wider(names_from = Measurement,
values_from = LabValue,
id_cols = PatientID:ParticipationDate,
## take mean for duplicated measurements
values_fn = mean)
# # A tibble: 2 × 11
# PatientID ParticipationDate Crea_Baseline Gluc_Baseline Hb_Baseline Gluc_Month3 Crea_Month3 Hb_Month3
# <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 PAT_001 2023-01-05 5 4.2 4.2 4.6 5.2 5.1
# 2 PAT_220 2023-03-12 NA 4.4 5.25 5.3 4.8 4.6
# # ℹ 3 more variables: Crea_Month12 <dbl>, Hb_Month12 <dbl>, Gluc_Month12 <dbl>