Take the following data. I want to add a column indicating which group of connected values each row is part of.
library(tidyverse)
df <- structure(list(fruit = c("apple", "apple", "apple", "pear", "pear",
"banana", "banana", "peach", "cherry"), name = c("joe", "sally",
"steve", "pete", "kate", "george", "alex", "alex", "alex")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -9L))
df
# A tibble: 9 × 2
fruit name
<chr> <chr>
1 apple joe
2 apple sally
3 apple steve
4 pear pete
5 pear kate
6 banana george
7 banana alex
8 peach alex
9 cherry alex
Here is the kind of output I'm looking for. Groups 1 and 2 are straightforward--they are simply joined by the common fruit
value.
Group3 is more complicated. George is connected to banana. Banana is connected to Alex, who is also connected to peach and cherry. So group3 contains George, Alex, banana, peach, and cherry.
# A tibble: 9 × 3
fruit name group
<chr> <chr> <chr>
1 apple joe group1
2 apple sally group1
3 apple steve group1
4 pear pete group2
5 pear kate group2
6 banana george group3
7 banana alex group3
8 peach alex group3
9 cherry alex group3
Essentially, the group
field needs to contain a common ID for all the values which would be connected in a network graph, like so:
tidygraph::as_tbl_graph(df) %>%
ggraph(layout = "tree") +
geom_edge_link() +
geom_node_point() +
geom_node_label(aes(label = name))
You could try components
from igraph
library(igraph)
df %>%
mutate(group = paste0("group", {
graph_from_data_frame(.) %>%
components() %>%
membership() %>%
`[`(fruit)
}))
which gives
# A tibble: 9 × 3
fruit name group
<chr> <chr> <chr>
1 apple joe group1
2 apple sally group1
3 apple steve group1
4 pear pete group2
5 pear kate group2
6 banana george group3
7 banana alex group3
8 peach alex group3
9 cherry alex group3