I have this type of data in an ordered R dataframe.
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
The result I'm trying to get at is this:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
Basically, the first sport in time an id practices stays if the next sport he practices is the same as the previous one, type_2 remains the same, but as soon as he changes sport later on, he changes to multi for the rest of his values later on.
I tried do this with lag()
, lead()
and if_else()
in dplyr
but the results never come out the way I want.
You may use rleid
from data.table
to generate the running length id for type
variable in each id
. Everything after the first change becomes "Multi"
.
library(data.table)
setDT(df)[, type2 := replace(type, rleid(type) > 1, 'Multi'), id]
df
# date id type type2
# 1: 2019-02-18 1 Football Football
# 2: 2019-02-28 1 Football Football
# 3: 2019-03-13 1 Rugby Multi
# 4: 2019-09-29 1 Football Multi
# 5: 2019-10-09 2 Hockey Hockey
# 6: 2020-03-19 2 Tennis Multi
# 7: 2020-04-21 3 Hockey Hockey
# 8: 2020-06-19 4 Basketball Basketball
# 9: 2020-09-08 4 Basketball Basketball
#10: 2020-10-08 5 Rugby Rugby
If you prefer to write it in dplyr
-
library(dplyr)
df %>%
group_by(id) %>%
mutate(type2 = replace(type, rleid(type) > 1, 'Multi')) %>%
ungroup