rfor-loopdplyrindexingsapply

Comparing a selected range of indexes to a threshold


I want to compare all scores in a certain range (from i until the index that belongs to the range of i) to the last baseline score, and update the baseline score in a recursive way. The range is based on the corresponding index that meets the minimal time difference required to be able to confirm a new baseline. If all scores in this range are lower than the last baseline score, then I want the new baseline to become the highest value of all scores in the range (ie, the closest to the old baseline).

df <- tibble(
i = c("1", "2", "3", "4", "5", "6", "7", "8", "9"),
range_index = c("2", "4", "4", "5", "7", "7", "9", "9", "NA"),
score = c("5", "4", "4", "3", "2", "2", "3", "1", "1")) 

I am looking to do something like this in sapply or a for loop:

df <- df %>%
mutate(
baseline = first(score),
baseline = sapply(1:n(), function(i) {
  if (all(score[i]:score[range_index[i]]) < baseline[i-1]) {return(max(score[i]:score[range_index[i]]))}
  else {return(baseline[i-1])}}))

But I think score[i]:score[range[i]] doesn't compare all scores to the last baseline. How can I create a condition that is true if each of these scores are lower than the last baseline?

The desired outcome is:

baseline = c("5", "4", "4", "3", "3", "3", "3", "1", "1")

Explanation: the first baseline is 5. At i=2 the new baseline is set to 4, as all scores between i=2 and i=4 (corresponding range) are lower than 5. The new baseline is 4, and not 3, because 4 is the greatest score out of i=2 until i=4. At i=4 we obtain the new baseline 3, because all scores in the range (score[4]=3, score[5]=2) are lower than the last baseline, which was 4. At i=5, we don't obtain a new baseline despite the decrease, because the range includes i=7, and score[7] (==3) is not lower than the last baseline (==3). The new baseline at i=8 is obtained as all scores i[8:9] are lower than the last baseline of 3.


Solution

  • I think it's best to iterate with a for loop.

    library(tidyverse)
    
    df <- tibble(id = rep(1:2, each = 9),
                 range_index = rep(c(2,4,4,5,7,7,9,9,NA), 2),
                 score = c(5,4,4,3,2,2,3,1,1,5,4,4,3,2,2,3,1,0))
    df %>%
      group_by(id) %>%
      mutate(
        baseline = {
          baseline <- score
          for (i in 2:(n() - 1)) {
            baseline[i] <- min(baseline[i - 1], max(score[i]:score[range_index[i]]))
          }
          baseline[n()] <- baseline[n() - 1]
          baseline
        }
      )
    #> # A tibble: 18 × 4
    #> # Groups:   id [2]
    #>       id range_index score baseline
    #>    <int>       <dbl> <dbl>    <dbl>
    #>  1     1           2     5        5
    #>  2     1           4     4        4
    #>  3     1           4     4        4
    #>  4     1           5     3        3
    #>  5     1           7     2        3
    #>  6     1           7     2        3
    #>  7     1           9     3        3
    #>  8     1           9     1        1
    #>  9     1          NA     1        1
    #> 10     2           2     5        5
    #> 11     2           4     4        4
    #> 12     2           4     4        4
    #> 13     2           5     3        3
    #> 14     2           7     2        3
    #> 15     2           7     2        3
    #> 16     2           9     3        3
    #> 17     2           9     1        1
    #> 18     2          NA     0        1