I would like to make the same like distinct()
but for groups. Here is an example:
data <- data.frame(
group = c(1, 1, 2, 3, 3, 4, 4, 5, 5),
procedure = c("A", "B", "A", "A", "B", "A", "X", "A", "X")
)
group procedure
1 1 A
2 1 B
3 2 A
4 3 A
5 3 B
6 4 A
7 4 X
8 5 A
9 5 X
I am expecting this:
Note: group_id
is just an interim and not important:
group procedure group_id
<dbl> <chr> <int>
1 1 A 2
2 1 B 2
3 2 A 1
4 4 A 3
5 4 X 3
I use this working code:
library(dplyr)
library(tidyr)
data %>%
summarise(procedure = toString(sort(procedure)), .by = group) %>%
mutate(group_id = as.integer(factor(procedure))) %>%
distinct(group_id, .keep_all = TRUE) %>%
separate_rows(procedure)
Is there a more direct method available? For context, my dataset contains 23,000 rows with numerous groups, and I need to identify and evaluate the main member of each group. Therefore, I'm looking for a way to efficiently distinguish and assess all unique groups. Could you suggest an approach to facilitate this evaluation?
I don't know if the code is short enough for you
data %>%
summarise(procedure = list(sort(procedure)), .by = group) %>%
filter(!duplicated(procedure)) %>%
unnest(procedure)
which gives
# A tibble: 5 × 2
group procedure
<dbl> <chr>
1 1 A
2 1 B
3 2 A
4 4 A
5 4 X