rfor-looprecursiondplyrsapply

How to a create recursive variable per group in data frame?


I have a data set with multiple observations of a clinical outcome for each patient. The length between these time points is variable. I want to create a "roving baseline score" variable for the clinical outcome, that changes every time an increase or a decrease of the score is confirmed at the first time point that is at least 90 days ahead.

I have created an example data frame with observations for two patients (patient 1 has 6 observations, patient 2 only has 2 observations):

library(tidyverse)

df <- tibble(
patID  = c("1", "1", "1", "1", "1", "1", "2", "2", "3", "3", "3", "3", "3", "3"),     
time_point = c("1", "2", "3", "4", "5", "6", "1", "2", "1", "2", "3", "4", "5", "6"),
date = as.Date(c("2020-01-01", "2020-05-01", "2020-06-01", "2020-09-01", "2021-01-01", "2021-05-01", "2020-01-01", "2020-05-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-06-01", "2020-10-01", "2021-04-01")),
score = c("300", "100", "100", "100", "200", "200", "600", "400", "300", "200", "200", "100", "400", "300"))

The vector with the baseline score should result in the following:

baseline = c("300", "300", "300", "100", "100", "200", "600", "600", "300", "300", "300", "200", "200", "300") 

I have tried creating the variable using a for loop, but I don't know how to make it such that it is done for each patient separately.

EDIT: I tried to write the code as following using sapply, but I get the error "argument is of length 0"

df_baseline <- df %>%
group_by(patID) %>%
mutate(baseline = lag(score, default = first(score)),
     closest_index_90 = sapply(1:n(), function(i) {
       valid_indices <- which(date[i]+90 <= date[(i+1):n()])
       if (length(valid_indices)>0) {return(first(valid_indices) + i)} 
       else {return(NA)}
     }),
     baseline = sapply(1:n(), function(i) {
       if (score[i:closest_index_90[i]] > baseline[i - 1] | score[i:closest_index_90[i]] < baseline[i - 1]) {return(score[which.min(abs(score[i:closest_index_90[i]]-baseline[i-1]))])}
     else {return(baseline[i-1])}}))

I am relatively new to R so my skills with for loops and sapply functions are limited. I have looked through many questions asked on here but none seemed to answer my question specifically. Any help would be greatly appreciated!


Solution

  • Found a solution using purrr::accumulate2

    df %>%
    group_by(patID) %>%
    mutate(
    closest_index_90 = sapply(1:n(), function(i) {
       valid_indices <- which(date[i]+90 <= date[(i+1):n()])
       if (length(valid_indices)>0) {return(first(valid_indices) + i)} 
       else {return(NA)}
     }),
    baseline = accumulate2(
       time_point,
       replace_na(closest_index_90, max(i)),
       \(b, i, r) {
       if (b>max(score[i:r])) {max(score[i:r]}
       else if (b<min(score[i:r])) {min(score[i:r]}
       else b
       }, .init = score[[1]]
       )[-1])