rdplyrapplysequential-number

Filter dataframe for longest sequence of repeated numbers by row in r


I am trying to create a "filter-by" matrix, which I can use to isolate rows of data in my data frame, such that each row contained only the values that correspond to the longest consecutive sequence of the same number, while the rest are kept zero. After searching around, I think rle is the function to use, but that does not give me what I am after. Here is an example of my code and results. Suggestions and solutions would be very much appreciated. Thank you!

SAMPLE DATA:

    a<- c(1,0,1,1,1,1,0,0)
    b<- c(0,0,0,1,1,1,0,1)
    c<- c(0,0,1,1,0,0,0,1)
    d<- c(1,0,0,1,1,1,1,0)
    e<- c(1,0,0,1,0,0,1,1)
    f<- c(0,0,0,1,1,1,0,1)
    g<- c(0,0,1,1,0,0,0,1)
    test.data <- data.frame(cbind(a,b,c,d,e,f,g))

    # > test.data
    #   a b c d e f g
    # 1 1 0 0 1 1 0 0
    # 2 0 0 0 0 0 0 0
    # 3 1 0 1 0 0 0 1
    # 4 1 1 1 1 1 1 1
    # 5 1 1 0 1 0 1 0
    # 6 1 1 0 1 0 1 0
    # 7 0 0 0 1 1 0 0
    # 8 0 1 1 0 1 1 1

SAMPLE CODE FOR ATTEMPTED SOLUTION:

result <- data.frame(lapply(test.data, function(x) {
  r <- rle(x)
  r$values[r$lengths!=max(r$lengths)]==1
  r2=inverse.rle(r)
  r2
}))

RESULT I GET (looks like exact copy of what went in?):

# > result
#    a b c d e f g
# 1  1 0 0 1 1 0 0
# 2  0 0 0 0 0 0 0
# 3  1 0 1 0 0 0 1
# 4  1 1 1 1 1 1 1
# 5  1 1 0 1 0 1 0
# 6  1 1 0 1 0 1 0
# 7  0 0 0 1 1 0 0
# 8  0 1 1 0 1 1 1

THIS IS THE RESULT I WANT TO GET (T/F can be used instead of 1 and 0, if easier):

# > result
#    a b c d e f g
# 1  0 0 0 1 1 0 0
# 2  0 0 0 0 0 0 0
# 3  0 0 0 0 0 0 0
# 4  1 1 1 1 1 1 1
# 5  1 1 0 0 0 0 0
# 6  1 1 0 0 0 0 0
# 7  0 0 0 1 1 0 0
# 8  0 0 0 0 1 1 1

PLEASE ADVISE!


Solution

  • library(magrittr)
    
    val <- 1
    
    test.data %>% 
        apply(1, function(x){
          rle(x) %$% { 
            if(all(values != val)) rep(0, length(x))
            else {
              m      <- max(lengths[values == val]) 
              # Get only longest sequences
              values <- (lengths == m & values == val)*values*(m > 1)
              # Get only one of them
              values[seq_along(values) != which(values == val)[1]] <- 0
              rep(values, lengths)
            }
        }}) %>% t
    
    #      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    # [1,]    0    0    0    1    1    0    0
    # [2,]    0    0    0    0    0    0    0
    # [3,]    0    0    0    0    0    0    0
    # [4,]    1    1    1    1    1    1    1
    # [5,]    1    1    0    0    0    0    0
    # [6,]    1    1    0    0    0    0    0
    # [7,]    0    0    0    1    1    0    0
    # [8,]    0    0    0    0    1    1    1