rgroupingsequentialmutate

In sequential order, find the first pair of results with a given relative change per group, in R


Consider this data:

   group visit result     rc10
   <dbl> <dbl>  <dbl>    <dbl>
 1     1     1    291        0
 2     1     2    195        0
 3     1     3    159        1
 4     1     4    140        0
 5     1     5    152        1
 6     2     1    218        0
 7     2     2    194        0
 8     2     3    172        0
 9     2     4    140        1
10     2     5    145        1
11     2     6    142        0
12     3     1    254        0
13     3     2    174        1
14     3     3    418        0
15     3     4    170        1
16     4     1    160        1
17     4     2    170        1
18     4     3    120        0
19     5     1    312        1
20     5     2    260        0
21     5     3    301        1
22     5     4    305        0
23     5     5    262        0 

Given the relative change formula RC = 100 * (final result - initial result) / initial result, how to identify, for each group, the first pair of results with a RC <10%? By "first pair", I mean identifying the pair of results with a RC<10% in the fewest possible visits. Put another way, the idea is, in each group, to calculate the RC for each pair of results, and to identify the pair that will have the RC<10% with the fewest possible visits, whether the results of this pair are consecutive or not.
Each element of this pair would be identified by 1 in a new column rc10.
NB: Subsequent results (beyond those of the pair) also showing a RC<10% will not be kept.

To illustrate the approach using group 1 as an example, test sequentially:
2 visits:
RC(1vs2) = 100*(291-195)/291 = 33.0%, so 0 for both
So, introduce a 3rd visit:
RC(1vs3) = 100*(291-159)/291 = 45.4%, so 0 for both
RC(2vs3) = 100*(195-159)/195 = 18.5%, so 0 for both
So, introduce a 4th visit:
RC(1vs4) = 100*(291-140)/291 = 51.9%, so 0 for both
RC(2vs4) = 100*(195-140)/195 = 28.2%, so 0 for both
RC(3vs4) = 100*(159-140)/159 = 11.9%, so 0 for both
So, introduce a 5th visit:
RC(1vs5) = 100*(291-152)/291 = 47.8%, so 0 for both
RC(2vs5) = 100*(195-152)/195 = 22.0%, so 0 for both
RC(3vs5) = 100*(159-152)/159 = 4.4% which is within +/-10%, so 1 for both
RC(4vs5) = 100*(140-152)/140 = -8.6% which is also within +/-10%, but the right pair has already been found previously, so 0 for the new result 140.

Thanks for help.

Data:

dat <-
structure(list(group = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 
3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5), visit = c(1, 2, 3, 4, 5, 1, 
2, 3, 4, 5, 6, 1, 2, 3, 4, 1, 2, 3, 1, 2, 3, 4, 5), result = c(291, 
195, 159, 140, 152, 218, 194, 172, 140, 145, 142, 254, 174, 418, 
170, 160, 170, 120, 312, 260, 301, 305, 262), rc10 = c(0, 0, 
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-23L))

Solution

  • The function could perhaps be simplified/improved but this should return the first pair where the absolute RC metric is < 10.

    library(data.table)
    
    setDT(dat)
    
    find_pair <- function(x){
      check_condition <- \(x, y) {
         # relative change less than 10%
         abs(100*(y - x)/x) < 10
      }
      
      n <- length(x)
      p <- integer(n)
      if (n < 2) return(p)
      # which (if any) pair differences meet criteria
      d <- lapply(seq(n-1), \(i) check_condition(x[i], x[(i+1):n]) ) 
      # any pairs?
      m <- which(sapply(d, any))
      if (length(m) < 1) return(p) # if no pairs return all 0
      p1 <- min(m) # index of first member of pair
      p2 <- min(which(d[[p1]])) # relative pos of second member
      p[c(p1, p1+p2)] <- 1
      p
    }
    
    # check
    find_pair(c(10, 20, 10))
    #> [1] 1 0 1
    find_pair(c(10, 20, 30))
    #> [1] 0 0 0
    find_pair(c(10, 20, 9.1))
    #> [1] 1 0 1
    
    dat[, pair := find_pair(result), by=.(group)]
    dat
    #>     group visit result varinf10  pair
    #>     <num> <num>  <num>    <num> <num>
    #>  1:     1     1    291        0     0
    #>  2:     1     2    195        0     0
    #>  3:     1     3    159        1     1
    #>  4:     1     4    140        0     0
    #>  5:     1     5    152        1     1
    #>  6:     2     1    218        0     0
    #>  7:     2     2    194        0     0
    #>  8:     2     3    172        0     0
    #>  9:     2     4    140        1     1
    #> 10:     2     5    145        1     1
    #> 11:     2     6    142        0     0
    #> 12:     3     1    254        0     0
    #> 13:     3     2    174        1     1
    #> 14:     3     3    418        0     0
    #> 15:     3     4    170        1     1
    #> 16:     4     1    160        1     1
    #> 17:     4     2    170        1     1
    #> 18:     4     3    120        0     0
    #> 19:     5     1    312        1     1
    #> 20:     5     2    260        0     0
    #> 21:     5     3    301        1     1
    #> 22:     5     4    305        0     0
    #> 23:     5     5    262        0     0
    #>     group visit result varinf10  pair
    

    Created on 2025-11-03 with reprex v2.1.1

    Or find_pair could be used with dplyr

    dat |> 
      group_by(group) |> 
         mutate(pair = find_pair(result))
    

    Further, the check_condition function can be modified to change the rule for identifying pairs. For example, from the comment on this answer, if the rule was to look for absolute differences less than 10 in relative change (%) units when the original value is >= 170 but use an absolute difference of less than 15 raw units when the initial value is < 170 the check_condition function could be (Note this function assumes x is not NA).

      check_condition <- \(x, y) {
        if (x < 170){
          # absolute change of less than 15 units
          abs(y-x) < 15
        } else{
          # relative change less than 10%
          abs(100*(y - x)/x) < 10
        }
      }