rcountrle

Count events with two specific conditions in R


I need to count events with two specific conditions and aggregate by year. My data example is below:

year <- c(rep(1981,20))
k1 <- c(rep(NA,5),rep("COLD",4),rep(NA,4),"COLD",NA,"COLD",rep(NA,4))
k2 <- c(rep(NA,10),rep("COLD",2),rep(NA,8))
k3 <- c(rep(NA,3),"COLD",rep(NA,16))
k4 <- c(rep(NA,3),rep("COLD",5),rep(NA,2),rep("COLD",5),NA,rep("COLD",4))
k5 <- c(rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,3),"COLD",rep(NA,8))

df <- data.frame(year,k1,k2,k3,k4,k5)

I use rle, which I found easy to apply. My code is able to count the number of events with 5 consecutive records "COLD" and do it separately for each year. But here I need to add another condition, that between two separate events (which is 5 or more "COLD") should be at least 3 records "NA" (or three gaps), if less than 3 "NA", then it is the same event. My code:

rle_col = function(k_col, num = 5){
    k_col[is.na(k_col)] = "NA" # convert NAs
    r = rle(k_col) # run length encoding
    which_cold = r$values == "COLD"
    sum(r$lengths[which_cold] >= num)
}

result <- aggregate(df[2:6],by = list(df$year), rle_col)

I tried the code below, but unfortunately, it doesn't work as I expected... Any suggestions? THANKS!

rle_col = function(k_col, num = 5, numm = 3){
    k_col[is.na(k_col)] = "NA" # convert NAs
    r = rle(k_col) # run length encoding
    which_cold = r$values == "COLD"
    which_gap = r$values == "NA"
    sum(r$lengths[which_cold] >= num & r$lengths[which_gap] >= numm)

The result I want should look like this:

     year    k1    k2    k3    k4    k5
     <dbl> <int> <int> <int> <int> <int>
     1981     0     0     0     1     0

Solution

  • We may use tidyverse

    library(dplyr)
    df %>% 
        group_by(year) %>% 
        summarise(across(starts_with('k'), rle_col))
    # A tibble: 1 × 6
       year    k1    k2    k3    k4    k5
      <dbl> <int> <int> <int> <int> <int>
    1  1981     0     0     0     1     0
    

    where rle_col is

    rle_col <-  function(k_col, num = 5) {
    
        with(rle(is.na(k_col)), {
               i1 <- values
                i1[values & lengths <3] <- 'Invalid'
                sum(!values & lengths >= 5 & 
            (lag(i1) != "Invalid"|lead(i1) != "Invalid"), na.rm = TRUE)
    
                 })
     }