rcluster-analysisigraphdata-cleaningnetwork-analysis

Clustering or finding community based on adjacency matrix


I am trying to find a way to find the clusters/communities in which nodes in my data belong given a table of relationships or an adjacency matrix. I define a community as nodes that have any direct or indirect connections. I would like all nodes in my dataset that have direct/indirect connections to be within the same cluster/community. Nodes that don't have any direct/indirect connections must be placed in a different cluster/community.

How do I go about this?

Below is toy example of how my data looks like. Suppose I have two groups A & B with four people each 1, 2, 3, 4. I don't know that these groups exisit a priori. I have access to a table that gives me the relationship of the nodes to each other. The relationship column rel takes a value of 1 if A* or B* is related with A** or B**. I don't know a priori that these two groupings exist. My task is to find the groups in which all units belong based on their direct/indirect relationships. In this example, units whose names begin with A are exclusively linked to each other. And so is B.

x <- tibble(name = rep(c(paste0("A", 1:4), paste0("B", 1:4)), each = 8),
            peer = rep(c(paste0("A", 1:4), paste0("B", 1:4)), times = 8)) |>
  # Relationships 
  mutate(rel = case_when(str_detect(name, "A") & str_detect(peer, "B") ~ 0, 
                         # Units are related to themselves
                         name == peer ~ 1, 
                         # A1 and A2 are peers; A3 and A4 are peers
                         name %in% c("A1", "A2") & peer %in% c("A1", "A2") ~ 1,
                         name %in% c("A3", "A4") & peer %in% c("A3", "A4") ~ 1,
                         # A2 and A3 are peers
                         (name == "A2" & peer == "A3") | (name == "A3" & peer == "A2") ~ 1,
                         # B1 and B4 are peers; 
                         (name == "B1" & peer == "B4") | (name == "B4" & peer == "B1") ~ 1,
                         # B2 and B3 is peers with B4 only
                         (name %in% c("B2", "B3") & peer == "B4") |  (name == "B4" & peer == c("B2", "B3")) ~ 1,
                         TRUE ~ 0))

I have tried using igraph::cluster_fast_greedy() but I noticed that it doesn't always put all nodes with direct/indirect connections within the same group. I'm open to ideas using data wrangling / data manipulation as opposed to algorithms in social network analysis.

Edit: After experimenting a bit, I find that igraph::clusters() or igraph::components() seems to give me what I need. Open to other ideas if there are any.


Solution

  • Probably you can use decompose, e.g.,

    x %>%
        filter(rel > 0) %>%
        graph_from_data_frame() %>%
        decompose()
    

    which gives

    [[1]]
    IGRAPH e456d6d DN-- 4 10 --
    + attr: name (v/c), rel (e/n)
    + edges from e456d6d (vertex names):
     [1] A1->A1 A1->A2 A2->A1 A2->A2 A2->A3 A3->A2 A3->A3 A3->A4 A4->A3 A4->A4
    
    [[2]]
    IGRAPH e456d92 DN-- 4 8 --
    + attr: name (v/c), rel (e/n)
    + edges from e456d92 (vertex names):
    [1] B1->B1 B1->B4 B2->B2 B2->B4 B3->B3 B3->B4 B4->B1 B4->B4