I hope you can help me with this problem, I have the following data like this:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
Every time meet with base (base_yellow, base_red), it creates new group, the output that is expected as shown below, which gives a new variable:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
This is something you might be able to adapt for your needs.
First, create a vector vec
that includes row positions where colour
starts with "base".
Then, you can use map2_dfr
from purrr
that will provide colour
that ranges from start to end positions based on vec
. This will help with situations where the same colour
is used in more than one row in the end. A grouping variable group
is also created in this step.
After grouping by group
, you can keep only colour
groups that have more than one colour
and str_c
to collapse them together for the same group
.
library(tidyverse)
vec <- which(grepl("^base", df$colour))
map2_dfr(
vec[-length(vec)],
vec[-1],
~df[.x:.y, ],
.id = "group"
) %>%
group_by(group) %>%
filter(n_distinct(colour) > 1) %>%
summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
select(-group)
Output
ID colour
<int> <chr>
1 1 base_yellow; blue; base_red
2 1 base_red; blue; pink; blue; base_yellow
3 2 base_yellow; blue; base_red
4 2 base_red; blue; pink; blue; base_yellow
5 3 base_yellow; blue; pink; blue; base_yellow
6 4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow