rvectorsequencerun-length-encoding

Efficiently find the first of the last 1's sequence


I have the following vectors with 0s and 1s:

test1 <- c(rep(0,20),rep(1,5),rep(0,10),rep(1,15)) 

test1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
                                                                          ^
test2 <- c(rep(0,8),rep(1,4),rep(0,5),rep(1,5),rep(0,6),rep(1,10),rep(0,2)) 

test2
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
                                                            ^

I need to find the index of first 1 in the last sequence of 1s (indicated by ^ in the above code). I have a solution (below) that doesn't perform well, how could I improve the performance?

For test1 and test2, the expected output is 36 and 29, respectively.

Here is a sub-optimal solution:

temp1 <- cumsum(test1)
which(temp1==max(temp1[duplicated(temp1)&temp1!=max(temp1)]+1))[1]
[1] 36

temp2 <- cumsum(test2)
which(temp2==max(temp2[duplicated(temp2)&temp2!=max(temp2)]+1))[1]
[1] 29

Note: The length of actual vectors is ~10k.


Solution

  • Another way with which + diff.

    idx <- which(test1 == 1)
    idx[tail(which(c(0, diff(idx)) != 1), 1)]
    #[1] 36