Data I have:
A | B |
---|---|
1 | a |
2 | c |
2 | e |
3 | f |
4 | h |
5 | c |
5 | e |
What I want:
If all values of B in different groups of A match, then they’re in the same group. In this case, all B values (c and e) match for 2 and 5.
A | B | Group |
---|---|---|
1 | a | 1 |
2 | c | 2 |
2 | e | 2 |
3 | f | 3 |
4 | h | 4 |
5 | c | 2 |
5 | e | 2 |
Code I attempted:
library(readxl)
library(dplyr)
library(stringr)
data1 <- read_excel("testing.xlsx")
data2 <- data1 %>%
group_by(A) %>%
group_by(B) %>%
mutate(Group = cur_group_id()) %>%
ungroup()
What I’m getting from this code:
A | B | Group |
---|---|---|
1 | a | 1 |
2 | c | 2 |
2 | e | 3 |
3 | f | 4 |
4 | h | 5 |
5 | c | 2 |
5 | e | 3 |
NOTE: The original data I am manipulating has been left-joined and then grouped.
One trick is to group by the list of elements that share the same "A". You can then use cur_group_id
properly. I add an arrange
call first to make sure that identical groups are recognized as such even if the order is different in the original data.
library(dplyr)
library(tidyr)
dat |>
arrange(A, B) |>
summarise(B = list(B), .by = A) |>
mutate(Group = cur_group_id(), .by = B) |>
unnest_longer(B)
# A B Group
# 1 1 a 1
# 2 2 c 2
# 3 2 e 2
# 4 3 f 3
# 5 4 h 4
# 6 5 c 2
# 7 5 e 2