I am trying to find a way to find the clusters/communities in which nodes in my data belong given a table of relationships or an adjacency matrix. I define a community as nodes that have any direct or indirect connections. I would like all nodes in my dataset that have direct/indirect connections to be within the same cluster/community. Nodes that don't have any direct/indirect connections must be placed in a different cluster/community.
How do I go about this?
Below is toy example of how my data looks like. Suppose I have two groups A & B with four people each 1, 2, 3, 4. I don't know that these groups exisit a priori. I have access to a table that gives me the relationship of the nodes to each other. The relationship column rel
takes a value of 1 if A* or B* is related with A** or B**. I don't know a priori that these two groupings exist. My task is to find the groups in which all units belong based on their direct/indirect relationships. In this example, units whose names begin with A are exclusively linked to each other. And so is B.
x <- tibble(name = rep(c(paste0("A", 1:4), paste0("B", 1:4)), each = 8),
peer = rep(c(paste0("A", 1:4), paste0("B", 1:4)), times = 8)) |>
# Relationships
mutate(rel = case_when(str_detect(name, "A") & str_detect(peer, "B") ~ 0,
# Units are related to themselves
name == peer ~ 1,
# A1 and A2 are peers; A3 and A4 are peers
name %in% c("A1", "A2") & peer %in% c("A1", "A2") ~ 1,
name %in% c("A3", "A4") & peer %in% c("A3", "A4") ~ 1,
# A2 and A3 are peers
(name == "A2" & peer == "A3") | (name == "A3" & peer == "A2") ~ 1,
# B1 and B4 are peers;
(name == "B1" & peer == "B4") | (name == "B4" & peer == "B1") ~ 1,
# B2 and B3 is peers with B4 only
(name %in% c("B2", "B3") & peer == "B4") | (name == "B4" & peer == c("B2", "B3")) ~ 1,
TRUE ~ 0))
I have tried using igraph::cluster_fast_greedy()
but I noticed that it doesn't always put all nodes with direct/indirect connections within the same group. I'm open to ideas using data wrangling / data manipulation as opposed to algorithms in social network analysis.
Edit: After experimenting a bit, I find that igraph::clusters()
or igraph::components()
seems to give me what I need. Open to other ideas if there are any.
Probably you can use decompose
, e.g.,
x %>%
filter(rel > 0) %>%
graph_from_data_frame() %>%
decompose()
which gives
[[1]]
IGRAPH e456d6d DN-- 4 10 --
+ attr: name (v/c), rel (e/n)
+ edges from e456d6d (vertex names):
[1] A1->A1 A1->A2 A2->A1 A2->A2 A2->A3 A3->A2 A3->A3 A3->A4 A4->A3 A4->A4
[[2]]
IGRAPH e456d92 DN-- 4 8 --
+ attr: name (v/c), rel (e/n)
+ edges from e456d92 (vertex names):
[1] B1->B1 B1->B4 B2->B2 B2->B4 B3->B3 B3->B4 B4->B1 B4->B4