rpivotpartitioning

How to partition by patient and labresulttype and moment, to select the result closest to begin of momentcode and pivot to wide format?


In Rstudio I've got a dataset with labresult similar to like below. It has the values Crea, Gluc and Hb and each patient has a participation start date.

Note: the real dataset is larger, with more patients and more labresults.

I've added a column with days since, and I want to add a column MomentCode to mark the records as closest to baseline, month3 and month12, with a tolerance of one week (7 days).

After that I want to limit to only the single labresults closest (but not before) the baseline, 3 months, 12 months etc.

So in the end, the result format should be something like this:

PatientID ParticipationDate BASELINE_Crea BASELINE_Gluc BASELINE_Hb MONTH3_Crea MONTH3_Gluc MONTH3_Hb MONTH12_Crea MONTH12_Gluc MONTH12_Hb
PAT_001 05-01-2023 5.0 5.0 etc etc
PAT_220 12-03-2023 5.0 5.0

I'm doing everything step-by-step but maybe there's a better way? See code below

df <- read.table(text = "
PatientID,ParticipationDate,LabResultDate,LabType,LabValue,Verified
PAT_001,05-01-2023,05-01-2023,Crea,5.0,Yes
PAT_001,05-01-2023,07-01-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2023,Hb,4.2,No
PAT_220,12-03-2023,03-03-2023,Hb,5.2,Yes
PAT_220,12-03-2023,15-03-2023,Hb,5.3,Yes
PAT_220,12-03-2023,16-03-2023,Gluc,4.4,Yes
PAT_001,05-01-2023,03-04-2023,Gluc,4.6,Yes
PAT_001,05-01-2023,06-04-2023,Crea,5.4,Yes
PAT_001,05-01-2023,07-04-2023,Crea,5.0,Yes
PAT_001,05-01-2023,08-04-2023,Hb,5.1,Yes
PAT_220,12-03-2023,11-06-2023,Gluc,5.3,Yes
PAT_220,12-03-2023,12-06-2023,Crea,4.8,No
PAT_220,12-03-2023,14-06-2023,Hb,4.6,Yes
PAT_220,12-03-2023,28-06-2023,Crea,3.9,No
PAT_220,12-03-2023,23-07-2023,Hb,5.1,No
PAT_001,05-01-2023,27-07-2023,Gluc,4.3,Yes
PAT_220,12-03-2023,29-07-2023,Crea,5.1,Yes
PAT_220,12-03-2023,25-08-2023,Gluc,4.9,Yes
PAT_220,12-03-2023,27-08-2023,Crea,4.3,Yes
PAT_220,12-03-2023,14-09-2023,Crea,5.5,Yes
PAT_001,05-01-2023,17-09-2023,Hb,5.5,Yes
PAT_220,12-03-2023,09-11-2023,Hb,5.4,No
PAT_001,05-01-2023,13-11-2023,Gluc,4.2,Yes
PAT_001,05-01-2023,17-11-2023,Hb,5.2,Yes
PAT_001,05-01-2023,29-12-2023,Crea,5.4,Yes
PAT_001,05-01-2023,31-12-2023,Crea,4.4,Yes
PAT_220,12-03-2023,03-01-2024,Gluc,4.2,Yes
PAT_001,05-01-2023,09-01-2024,Gluc,5.4,Yes
PAT_001,05-01-2023,09-01-2024,Hb,4.0,Yes
PAT_001,05-01-2023,13-01-2024,Crea,4.7,Yes
PAT_001,05-01-2023,07-03-2024,Hb,4.2,Yes
PAT_220,12-03-2023,14-03-2024,Gluc,4.4,Yes
PAT_220,12-03-2023,15-03-2024,Crea,5.0,No
PAT_220,12-03-2023,17-03-2024,Hb,3.9,Yes
PAT_220,12-03-2023,23-05-2024,Crea,4.4,Yes
PAT_001,05-01-2023,23-06-2024,Gluc,4.8,No
PAT_220,12-03-2023,04-08-2024,Hb,4.3,Yes
PAT_220,12-03-2023,24-08-2024,Gluc,4.5,Yes
", header = TRUE, sep = ",", na.strings = "")

# days since participation
df$DaysSince <- as.numeric(difftime(df$ParticipationDate, df$LabResultDate, units = "days"))

# mark as moment (one week tolerance) how to??
library(dplyr)
df <- df %>%
  mutate(MomentCode = case_when(
           DaysSince>=0   && DaysSince<7     ~ "Baseline",
           DaysSince>=90  && DaysSince<90+7  ~ "Month3",
           DaysSince>=365 && DaysSince<365+7 ~ "Month12"
         )
  )

# patition by patient and code and moment to get only values closest to each moment code, how??

# pivot to wide format, how??

Solution

  • A couple of comments regarding your example data:

    1. You calculate the difference between lab result and participation which should be the other way round.
    2. There are negative DaysSince (LabResultDate before ParticipationDate which does not make sense)
    3. There are differences between particpation and lab measurement which do not fall in either of your defined categories (less than 90 days but more than 7 days)
    4. There are multiple measurements for a given time frame lab type combination.

    Having said that, I made the following assumptions:

    1. We define 2 cutoffs: 7 days & 97 days. Any difference <= 7 days will fall into the baseline, differences > 7 days and <= 97 days fall into the 3 month slot and differnces > 97 days fall into the 12 month slot (however, it is rather easy to extend the persiods simply by adding more cutoofs / labels to cut the rest of the code would remain the same in this case)
    2. If there is more than one measurement per combination we take the mean of those measurements:
    library(dplyr)
    library(tidyr)
    library(lubridate)
    
    df2 <- df %>%
      as_tibble() %>% ## format as tibble for nicer printing
      mutate(## format date columns as date
             across(ends_with("Date"), dmy), 
             ## calculate differences in days
             DaysSince = as.numeric(difftime(LabResultDate, ParticipationDate, unit = "days")), 
             ## split day differences into buckets with cutoffs 7 and 97 days
             MomentCode = cut(DaysSince, c(-Inf, 7, 97, Inf), c("Baseline", "Month3", "Month12")),
             ## create new column comnbining LabType and MomentCode
             Measurement = paste(LabType, MomentCode, sep = "_")
      )
    
    df2 %>%
      pivot_wider(names_from = Measurement, 
                  values_from = LabValue, 
                  id_cols = PatientID:ParticipationDate, 
                  ## take mean for duplicated measurements
                  values_fn = mean)
    
    # # A tibble: 2 × 11
    #   PatientID ParticipationDate Crea_Baseline Gluc_Baseline Hb_Baseline Gluc_Month3 Crea_Month3 Hb_Month3
    #   <chr>     <date>                    <dbl>         <dbl>       <dbl>       <dbl>       <dbl>     <dbl>
    # 1 PAT_001   2023-01-05                    5           4.2        4.2          4.6         5.2       5.1
    # 2 PAT_220   2023-03-12                   NA           4.4        5.25         5.3         4.8       4.6
    # # ℹ 3 more variables: Crea_Month12 <dbl>, Hb_Month12 <dbl>, Gluc_Month12 <dbl>