I need to count events with two specific conditions and aggregate by year. My data example is below:
year <- c(rep(1981,20))
k1 <- c(rep(NA,5),rep("COLD",4),rep(NA,4),"COLD",NA,"COLD",rep(NA,4))
k2 <- c(rep(NA,10),rep("COLD",2),rep(NA,8))
k3 <- c(rep(NA,3),"COLD",rep(NA,16))
k4 <- c(rep(NA,3),rep("COLD",5),rep(NA,2),rep("COLD",5),NA,rep("COLD",4))
k5 <- c(rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,8))
df <- data.frame(year,k1,k2,k3,k4,k5)
I use rle, which I found easy to apply. My code is able to count the number of events with 5 consecutive records "COLD" and do it separately for each year. But here I need to add another condition, that between two separate events (which is 5 or more "COLD") should be at least 3 records "NA" (or three gaps), if less than 3 "NA", then it is the same event. My code:
rle_col = function(k_col, num = 5){
k_col[is.na(k_col)] = "NA" # convert NAs
r = rle(k_col) # run length encoding
which_cold = r$values == "COLD"
sum(r$lengths[which_cold] >= num)
}
result <- aggregate(df[2:6],by = list(df$year), rle_col)
I tried the code below, but unfortunately, it doesn't work as I expected... Any suggestions? THANKS!
rle_col = function(k_col, num = 5, numm = 3){
k_col[is.na(k_col)] = "NA" # convert NAs
r = rle(k_col) # run length encoding
which_cold = r$values == "COLD"
which_gap = r$values == "NA"
sum(r$lengths[which_cold] >= num & r$lengths[which_gap] >= numm)
The result I want should look like this:
year k1 k2 k3 k4 k5
<dbl> <int> <int> <int> <int> <int>
1981 0 0 0 1 0
We may use tidyverse
library(dplyr)
df %>%
group_by(year) %>%
summarise(across(starts_with('k'), rle_col))
# A tibble: 1 × 6
year k1 k2 k3 k4 k5
<dbl> <int> <int> <int> <int> <int>
1 1981 0 0 0 1 0
where rle_col
is
rle_col <- function(k_col, num = 5) {
with(rle(is.na(k_col)), {
i1 <- values
i1[values & lengths <3] <- 'Invalid'
sum(!values & lengths >= 5 &
(lag(i1) != "Invalid"|lead(i1) != "Invalid"), na.rm = TRUE)
})
}