[SOLVED] Function to evaluate vector based on time and stops once it reaches a threshold

Function to evaluate vector based on time and stops once it reaches a threshold

I have a data frame with three variables ID, year and value. I would like to know how many times the "value" has been larger than a threshold going backwards, from the most recent year to the oldest year by id.

id <- rep(c("a", "b", "c"), each = 5)
year <- rep(c(1997:2001), times = 3)
value <- c(0.1, 0.5, 0, 0, 0, 0, 0.1, 0.6, 0.7, 0.4, 0.6, 0,0.3,0.5,0.5)
data <- data.frame (id, year, value)

Under this idea I will have

id	year	value	output
a	1997	0.1	FALSE
a	1998	0.5	FALSE
a	1999	0	FALSE
a	2000	0	FALSE
a	2001	0	FALSE
b	1997	0	FALSE
b	1998	0.1	TRUE
b	1999	0.6	TRUE
b	2000	0.7	TRUE
b	2001	0.4	TRUE
c	1997	0.6	FALSE
c	1998	0	FALSE
c	1999	0.3	TRUE
c	2000	0.5	TRUE
c	2001	0.5	TRUE

I would like the function to assess staring from the most recent "year" = 2001 whether the "value" is larger than 0 (or any other threshold), and if it is larger than 0 (or any other threshold) move to the second most recent year 2000 and if it is larger than 0 move to the third most recent year and so on. But, whenever the value is equal to 0 stop or assign "FALSE" to the remaining set of values in that group.Thus, when I summarize it I can get the number of years with value larger than 0 uninterrupted. So for the example I would expect with output being the number of cases with values larger than the threshold from most recent to oldest uninterrupted.

id	output
a	0
b	4
c	3

Solution

library(dplyr)

data %>%
  arrange(id, -year) %>%
  summarise(output=sum(cummin(value>0)), .by=id)

Gives

  id output
1  a      0
2  b      4
3  c      3