rigraphsocial-network-friendship

friendship network identification in R


I want to identify networks where all people in the same network directly or indirectly connected through friendship nominations while no students from different networks are connected.

I am using the Add Health data. Each student nominates upto 10 friends. Say, sample data may look like this:

ID  FID_1   FID_2   FID_3   FID_4   FID_5   FID_6   FID_7   FID_8   FID_9   FID_10
1   2           6   7          9    10        NA     NA     NA        NA    NA
2   5           9   12        45    13        90     87     6         NA    NA
3   1           2   4          7    8          9     10     14        16    18
100   110       120   122      125   169     178    190    200       500  520
500    100      110   122      125   169     178    190    200       500  520
700    800      789    900     NA     NA       NA     NA    NA        NA   NA
1000   789     2000     820    900    NA       NA     NA    NA        NA   NA

There are around 85,000 individuals. Could anyone please tell me how I can get network ID? So, I would like the data to look the following

ID   network_ID           ID  network_ID
1     1                   700   3  
2     1                   789   3
3     1                   800   3
4     1                   820   3
5     1                   900   3
6     1                  1000   3
7     1                  2000   3
8     1
9     1
10    1
12    1
13    1
14    1
16    1
18    1
90    1
87    1
100   2
110   2
120   2
122   2
125   2
169   2
178   2
190   2
200   2
500   2
520   2

So, everyone directly or indirectly connected to ID 1 belong to network 1. 2 is a friend of 1. So, everyone directly or indirectly connected to 2 are also in 1's network and so on. 700 is not connected to 1 or friend of 1 or friend of friend of 1 and so on. Thus 700 is in a different network, which is network 3.

Any help will be much appreciated...


Solution

  • Update

    library(igraph)
    library(dplyr)
    library(data.table)
    
    setDT(df) %>%
        melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
        na.omit() %>%
        setcolorder(c("ID", "ID2", "FID")) %>%
        graph_from_data_frame() %>%
        components() %>%
        membership() %>%
        stack() %>%
        setNames(c("Network_ID", "ID")) %>%
        rev() %>%
        type.convert(as.is = TRUE) %>%
        arrange(Network_ID, ID)
    

    gives

         ID Network_ID
    1     1          1
    2     2          1
    3     3          1
    4     4          1
    5     5          1
    6     6          1
    7     7          1
    8     8          1
    9     9          1
    10   10          1
    11   12          1
    12   13          1
    13   14          1
    14   16          1
    15   18          1
    16   45          1
    17   87          1
    18   90          1
    19  100          2
    20  110          2
    21  120          2
    22  122          2
    23  125          2
    24  169          2
    25  178          2
    26  190          2
    27  200          2
    28  500          2
    29  520          2
    30  700          3
    31  789          3
    32  800          3
    33  820          3
    34  900          3
    35 1000          3
    36 2000          3
    

    Data

    > dput(df)
    structure(list(ID = c(1L, 2L, 3L, 100L, 500L, 700L, 1000L), FID_1 = c(2L,
    5L, 1L, 110L, 100L, 800L, 789L), FID_2 = c(6L, 9L, 2L, 120L,
    110L, 789L, 2000L), FID_3 = c(7L, 12L, 4L, 122L, 122L, 900L,
    820L), FID_4 = c(9L, 45L, 7L, 125L, 125L, NA, 900L), FID_5 = c(10L,
    13L, 8L, 169L, 169L, NA, NA), FID_6 = c(NA, 90L, 9L, 178L, 178L,
    NA, NA), FID_7 = c(NA, 87L, 10L, 190L, 190L, NA, NA), FID_8 = c(NA,
    6L, 14L, 200L, 200L, NA, NA), FID_9 = c(NA, NA, 16L, 500L, 500L,
    NA, NA), FID_10 = c(NA, NA, 18L, 520L, 520L, NA, NA)), class = "data.frame", row.names = c(NA,
    -7L))
    

    Are you looking for something like this?

    library(data.table)
    library(dplyr)
    
    setDT(df) %>%
        melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
        na.omit() %>%
        setcolorder(c("ID", "ID2", "FID")) %>%
        graph_from_data_frame() %>%
        plot(edge.label = E(.)$FID)
    

    enter image description here


    Data

    structure(list(ID = 1:3, FID_1 = c(2L, 5L, 1L), FID_2 = c(6L,
    9L, 2L), FID_3 = c(7L, 12L, 4L), FID_4 = c(9L, 45L, 7L), FID_5 = c(10L,
    12L, 8L), FID_6 = c(NA, 90L, 9L), FID_7 = c(NA, 87L, 10L), FID_8 = c(NA,
    6L, 14L), FID_9 = c(NA, NA, 16L), FID_10 = c(NA, NA, 18L)), class = "data.frame", row.names = c(NA,
    -3L))