I would like to create another column of a data frame that groups each member in the first column based on the order.
Here is a reproducible demo:
df1=c("Alex","23","ID #:123", "John","26","ID #:564")
df1=data.frame(df1)
library(dplyr)
library(data.table)
df1 %>% mutate(group= ifelse(df1 %like% "ID #:",1,NA ) )
This was the output from the demo:
df1 group
1 Alex NA
2 23 NA
3 ID #:123 1
4 John NA
5 26 NA
6 ID #:564 1
This is what I want:
df1 group
1 Alex 1
2 23 1
3 ID #:123 1
4 John 2
5 26 2
6 ID #:564 2
So I want to have a group column indicates each member in order.
I appreciate in advance for any reply or thoughts!
Shift the condition with lag
first and then do a cumsum
:
df1 %>%
mutate(group= cumsum(lag(df1 %like% "ID #:", default = 1)))
# df1 group
#1 Alex 1
#2 23 1
#3 ID #:123 1
#4 John 2
#5 26 2
#6 ID #:564 2
Details:
df1 %>%
mutate(
# calculate the condition
cond = df1 %like% "ID #:",
# shift the condition down and fill the first value with 1
lag_cond = lag(cond, default = 1),
# increase the group when the condition is TRUE (ID encountered)
group= cumsum(lag_cond))
# df1 cond lag_cond group
#1 Alex FALSE TRUE 1
#2 23 FALSE FALSE 1
#3 ID #:123 TRUE FALSE 1
#4 John FALSE TRUE 2
#5 26 FALSE FALSE 2
#6 ID #:564 TRUE FALSE 2