rloopsconditional-statementstime-seriessummarize

Function to evaluate vector based on time and stops once it reaches a threshold


I have a data frame with three variables ID, year and value. I would like to know how many times the "value" has been larger than a threshold going backwards, from the most recent year to the oldest year by id.

id <- rep(c("a", "b", "c"), each = 5)
year <- rep(c(1997:2001), times = 3)
value <- c(0.1, 0.5, 0, 0, 0, 0, 0.1, 0.6, 0.7, 0.4, 0.6, 0,0.3,0.5,0.5)
data <- data.frame (id, year, value)

Under this idea I will have

id year value output
a 1997 0.1 FALSE
a 1998 0.5 FALSE
a 1999 0 FALSE
a 2000 0 FALSE
a 2001 0 FALSE
b 1997 0 FALSE
b 1998 0.1 TRUE
b 1999 0.6 TRUE
b 2000 0.7 TRUE
b 2001 0.4 TRUE
c 1997 0.6 FALSE
c 1998 0 FALSE
c 1999 0.3 TRUE
c 2000 0.5 TRUE
c 2001 0.5 TRUE

I would like the function to assess staring from the most recent "year" = 2001 whether the "value" is larger than 0 (or any other threshold), and if it is larger than 0 (or any other threshold) move to the second most recent year 2000 and if it is larger than 0 move to the third most recent year and so on. But, whenever the value is equal to 0 stop or assign "FALSE" to the remaining set of values in that group.Thus, when I summarize it I can get the number of years with value larger than 0 uninterrupted. So for the example I would expect with output being the number of cases with values larger than the threshold from most recent to oldest uninterrupted.

id output
a 0
b 4
c 3

Solution

  • library(dplyr)
    
    data %>%
      arrange(id, -year) %>%
      summarise(output=sum(cummin(value>0)), .by=id)
    

    Gives

      id output
    1  a      0
    2  b      4
    3  c      3