rgroup-bysequencerowid

R: How to start a new sub_id each time a new sequence begins


Suppose I have data as follows:

tibble(
    A = c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5),
    B = c(1, 1, 2, 1, 2, 3, 1, 2, 1, 1, 1, 2, 3, 4, 1, 1),
)

i.e.,

# A tibble: 16 x 2
       A     B
   <dbl> <dbl>
 1     1     1
 2     2     1
 3     2     2
 4     2     1
 5     2     2
 6     2     3
 7     3     1
 8     3     2
 9     3     1
10     3     1
11     4     1
12     4     2
13     4     3
14     4     4
15     4     1
16     5     1

How do I create a sub_id each time a new sequence begins within the group defined by variable A, i.e.,

tibble(
    A = c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5),
    B = c(1, 1, 2, 1, 2, 3, 1, 2, 1, 1, 1, 2, 3, 4, 1, 1),
    sub_id = c(1, 1, 1, 2, 2, 2, 1, 1, 2, 3, 1, 1, 1, 1, 2, 1)
)
# A tibble: 16 x 3
       A     B sub_id
   <dbl> <dbl>  <dbl>
 1     1     1      1
 2     2     1      1
 3     2     2      1
 4     2     1      2
 5     2     2      2
 6     2     3      2
 7     3     1      1
 8     3     2      1
 9     3     1      2
10     3     1      3
11     4     1      1
12     4     2      1
13     4     3      1
14     4     4      1
15     4     1      2
16     5     1      1

Hopefully that’s well defined. I suppose I’m after a kind of inverse to row_number

Thanks in advance,

James.


Solution

  • Using base R

    df$sub_id <- with(df, ave(B ==1, A, FUN = cumsum))