I want to identify networks where all people in the same network directly or indirectly connected through friendship nominations while no students from different networks are connected.
I am using the Add Health data. Each student nominates upto 10 friends. Say, sample data may look like this:
ID FID_1 FID_2 FID_3 FID_4 FID_5 FID_6 FID_7 FID_8 FID_9 FID_10
1 2 6 7 9 10 NA NA NA NA NA
2 5 9 12 45 13 90 87 6 NA NA
3 1 2 4 7 8 9 10 14 16 18
100 110 120 122 125 169 178 190 200 500 520
500 100 110 122 125 169 178 190 200 500 520
700 800 789 900 NA NA NA NA NA NA NA
1000 789 2000 820 900 NA NA NA NA NA NA
There are around 85,000 individuals. Could anyone please tell me how I can get network ID? So, I would like the data to look the following
ID network_ID ID network_ID
1 1 700 3
2 1 789 3
3 1 800 3
4 1 820 3
5 1 900 3
6 1 1000 3
7 1 2000 3
8 1
9 1
10 1
12 1
13 1
14 1
16 1
18 1
90 1
87 1
100 2
110 2
120 2
122 2
125 2
169 2
178 2
190 2
200 2
500 2
520 2
So, everyone directly or indirectly connected to ID 1 belong to network 1. 2 is a friend of 1. So, everyone directly or indirectly connected to 2 are also in 1's network and so on. 700 is not connected to 1 or friend of 1 or friend of friend of 1 and so on. Thus 700 is in a different network, which is network 3.
Any help will be much appreciated...
library(igraph)
library(dplyr)
library(data.table)
setDT(df) %>%
melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
na.omit() %>%
setcolorder(c("ID", "ID2", "FID")) %>%
graph_from_data_frame() %>%
components() %>%
membership() %>%
stack() %>%
setNames(c("Network_ID", "ID")) %>%
rev() %>%
type.convert(as.is = TRUE) %>%
arrange(Network_ID, ID)
gives
ID Network_ID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 12 1
12 13 1
13 14 1
14 16 1
15 18 1
16 45 1
17 87 1
18 90 1
19 100 2
20 110 2
21 120 2
22 122 2
23 125 2
24 169 2
25 178 2
26 190 2
27 200 2
28 500 2
29 520 2
30 700 3
31 789 3
32 800 3
33 820 3
34 900 3
35 1000 3
36 2000 3
Data
> dput(df)
structure(list(ID = c(1L, 2L, 3L, 100L, 500L, 700L, 1000L), FID_1 = c(2L,
5L, 1L, 110L, 100L, 800L, 789L), FID_2 = c(6L, 9L, 2L, 120L,
110L, 789L, 2000L), FID_3 = c(7L, 12L, 4L, 122L, 122L, 900L,
820L), FID_4 = c(9L, 45L, 7L, 125L, 125L, NA, 900L), FID_5 = c(10L,
13L, 8L, 169L, 169L, NA, NA), FID_6 = c(NA, 90L, 9L, 178L, 178L,
NA, NA), FID_7 = c(NA, 87L, 10L, 190L, 190L, NA, NA), FID_8 = c(NA,
6L, 14L, 200L, 200L, NA, NA), FID_9 = c(NA, NA, 16L, 500L, 500L,
NA, NA), FID_10 = c(NA, NA, 18L, 520L, 520L, NA, NA)), class = "data.frame", row.names = c(NA,
-7L))
Are you looking for something like this?
library(data.table)
library(dplyr)
setDT(df) %>%
melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
na.omit() %>%
setcolorder(c("ID", "ID2", "FID")) %>%
graph_from_data_frame() %>%
plot(edge.label = E(.)$FID)
Data
structure(list(ID = 1:3, FID_1 = c(2L, 5L, 1L), FID_2 = c(6L,
9L, 2L), FID_3 = c(7L, 12L, 4L), FID_4 = c(9L, 45L, 7L), FID_5 = c(10L,
12L, 8L), FID_6 = c(NA, 90L, 9L), FID_7 = c(NA, 87L, 10L), FID_8 = c(NA,
6L, 14L), FID_9 = c(NA, NA, 16L), FID_10 = c(NA, NA, 18L)), class = "data.frame", row.names = c(NA,
-3L))