Data I have:
A | B |
---|---|
1 | a |
2 | c |
2 | e |
3 | f |
4 | h |
5 | c |
5 | e |
What I want:
A | B | Group |
---|---|---|
1 | a | 1 |
2 | c | 2 |
2 | e | 2 |
3 | f | 3 |
4 | h | 4 |
5 | c | 2 |
5 | e | 2 |
Code I attempted:
library(readxl)
library(dplyr)
library(stringr)
data1 <- read_excel("testing.xlsx")
data2 <- data1 %>%
group_by(A) %>%
group_by(B) %>%
mutate(Group = cur_group_id()) %>%
ungroup()
What I’m getting from this code:
A | B | Group |
---|---|---|
1 | a | 1 |
2 | c | 2 |
2 | e | 3 |
3 | f | 4 |
4 | h | 5 |
5 | c | 2 |
5 | e | 3 |
EDIT: I get the error — “Can’t supply ‘.by’ when ‘.data’ is a grouped data frame.” for all of the comments below. The original data I am manipulating has been left-joined and then grouped. How do I approach this?
You can try below
library(dplyr)
df %>%
left_join(
(.) %>%
summarise(group = as.factor(toString(sort(B))), .by = A) %>%
mutate(group = as.integer(group))
)
or you can use membership
from igraph
package in addition
library(dplyr)
library(igraph)
df %>%
mutate(group = {
(.) %>%
graph_from_data_frame() %>%
components() %>%
membership()
}[B])
which gives
A B group
1 1 a 1
2 2 c 2
3 2 e 2
4 3 f 3
5 4 h 4
6 5 c 2
7 5 e 2
igraph
interest)df %>%
graph_from_data_frame() %>%
plot()