I have a data frame with three variables ID, year and value. I would like to know how many times the "value" has been larger than a threshold going backwards, from the most recent year to the oldest year by id.
id <- rep(c("a", "b", "c"), each = 5)
year <- rep(c(1997:2001), times = 3)
value <- c(0.1, 0.5, 0, 0, 0, 0, 0.1, 0.6, 0.7, 0.4, 0.6, 0,0.3,0.5,0.5)
data <- data.frame (id, year, value)
Under this idea I will have
id | year | value | output |
---|---|---|---|
a | 1997 | 0.1 | FALSE |
a | 1998 | 0.5 | FALSE |
a | 1999 | 0 | FALSE |
a | 2000 | 0 | FALSE |
a | 2001 | 0 | FALSE |
b | 1997 | 0 | FALSE |
b | 1998 | 0.1 | TRUE |
b | 1999 | 0.6 | TRUE |
b | 2000 | 0.7 | TRUE |
b | 2001 | 0.4 | TRUE |
c | 1997 | 0.6 | FALSE |
c | 1998 | 0 | FALSE |
c | 1999 | 0.3 | TRUE |
c | 2000 | 0.5 | TRUE |
c | 2001 | 0.5 | TRUE |
I would like the function to assess staring from the most recent "year" = 2001 whether the "value" is larger than 0 (or any other threshold), and if it is larger than 0 (or any other threshold) move to the second most recent year 2000 and if it is larger than 0 move to the third most recent year and so on. But, whenever the value is equal to 0 stop or assign "FALSE" to the remaining set of values in that group.Thus, when I summarize it I can get the number of years with value larger than 0 uninterrupted. So for the example I would expect with output being the number of cases with values larger than the threshold from most recent to oldest uninterrupted.
id | output |
---|---|
a | 0 |
b | 4 |
c | 3 |
library(dplyr)
data %>%
arrange(id, -year) %>%
summarise(output=sum(cummin(value>0)), .by=id)
Gives
id output
1 a 0
2 b 4
3 c 3