rdataframedplyr

Assign values of a new column based on the frequency of a special pattern in dataframe


I would like to create another column of a data frame that groups each member in the first column based on the order.

Here is a reproducible demo:

df1=c("Alex","23","ID #:123", "John","26","ID #:564")
df1=data.frame(df1)
library(dplyr)
library(data.table)
df1 %>% mutate(group= ifelse(df1 %like% "ID #:",1,NA ) )

This was the output from the demo:

df1        group
1     Alex    NA
2       23    NA
3 ID #:123     1
4     John    NA
5       26    NA
6 ID #:564     1

This is what I want:

 df1         group
 1     Alex     1
 2       23     1
 3 ID #:123     1
 4     John     2
 5       26     2
 6 ID #:564     2

So I want to have a group column indicates each member in order.

I appreciate in advance for any reply or thoughts!


Solution

  • Shift the condition with lag first and then do a cumsum:

    df1 %>% 
        mutate(group= cumsum(lag(df1 %like% "ID #:", default = 1)))
    
    #       df1 group
    #1     Alex     1
    #2       23     1
    #3 ID #:123     1
    #4     John     2
    #5       26     2
    #6 ID #:564     2
    

    Details:

    df1 %>% 
        mutate(
            # calculate the condition
            cond = df1 %like% "ID #:", 
            # shift the condition down and fill the first value with 1
            lag_cond = lag(cond, default = 1),
            # increase the group when the condition is TRUE (ID encountered)
            group= cumsum(lag_cond))
    
    #       df1  cond lag_cond group
    #1     Alex FALSE     TRUE     1
    #2       23 FALSE    FALSE     1
    #3 ID #:123  TRUE    FALSE     1
    #4     John FALSE     TRUE     2
    #5       26 FALSE    FALSE     2
    #6 ID #:564  TRUE    FALSE     2