rdplyrdata.tablerun-length-encoding

Is there a dplyr equivalent to data.table::rleid?


data.table offers a nice convenience function, rleid for run-length encoding:

library(data.table)
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2, 2, 3, 1, 2)), value=1:10)
rleid(DT$grp)
# [1] 1 1 2 2 3 3 3 4 5 5

I can mimic this in base R with:

df <- data.frame(DT)
rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
# [1] 1 1 2 2 3 3 3 4 5 5

Does anyone know of a dplyr equivalent (?) or is the "best" way to create the rleid behavior with dplyr is to do something like the following

library(dplyr)

my_rleid = rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)

df %>%
  mutate(rleid = my_rleid)

Solution

  • From v1.1.0 added the function consecutive_id() modeled after data.table::rleid() with the same support for multiple vectors and the treatment of NA values.

     library(dplyr)
     
     DT %>%
       mutate(id = consecutive_id(grp)) 
    
        grp value id
     1:   A     1  1
     2:   A     2  1
     3:   B     3  2
     4:   B     4  2
     5:   C     5  3
     6:   C     6  3
     7:   C     7  3
     8:   A     8  4
     9:   B     9  5
    10:   B    10  5