I have an issue that I've been unable to optimise and I'm sure that either igraph or tidy graphs must hold this function already or there must be a better way to do this. I am using R and igraph to do this but possibly tidygraphs would also do the job.
Problem: How to define networks a list of over two million edges (node 1 - linked to - node 2) into their own separate networks and to then define the network as it's highest weighted node category.
Data:
Edges:
from | to |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
7 | 6 |
8 | 6 |
This creates 3 networks N.B. in the real example we have loops and multiple edges to and from nodes (this is why I've used igraph as it easily deals with these).
Data: Node categories:
id | cat | weight |
---|---|---|
1 | traffic accident | 10 |
2 | abuse | 50 |
3 | abuse | 50 |
4 | speeding | 5 |
5 | murder | 100 |
6 | abuse | 50 |
7 | speeding | 5 |
8 | abuse | 50 |
Final table: The final table categorises each node and labels each network with the max category of the nodes
id | idcat | networkid | networkcat |
---|---|---|---|
1 | traffic accident | 1 | 50 |
2 | abuse | 1 | 50 |
3 | abuse | 2 | 50 |
4 | speeding | 2 | 50 |
5 | murder | 3 | 100 |
6 | abuse | 3 | 100 |
7 | speeding | 3 | 100 |
8 | abuse | 3 | 100 |
Current iterative solution and code: If there is no better solution to this then maybe we can speed this iteration up?
library(tidyverse)
library(igraph)
library(purrr) #might be an answer
library(tidyverse)
library(tidygraph) #might be an answer
from <- c(1,3,5,7,8)
to <- c(2,4,6,6,6)
edges <- data.frame(from,to)
id <- c(1,2,3,4,5,6,7,8)
cat <- c("traffic accident","abuse","abuse","speeding","murder","abuse","speeding","abuse")
weight <- c(10,50,50,5,100,50,5,50)
details <- data.frame(id,cat,weight)
g <- graph_from_data_frame(edges)# we can add the vertex details here as well g <-
graph_from_data_frame(edges,vertices=details) but we join these in later
plot(g)
dg <- decompose(g)# decomposing the network defines the separate networks
networks <- data.frame(id=as.integer(),
network_id=as.integer())
for (i in 1:length(dg)) { # this is likely too many to do at once. As the networks are already defined we can split this into chunks. There is a case here for parellisation
n <- dg[[i]][1] %>% # using the decomposed list of lists from i graph. There is an issue here as the list comes back with the node as an index. I can't find an easier way to get this out
as.data.frame() %>% # I can't work a way to bring out the data without changing to df and then using row names
row.names() %>% # and this returns a vector
as.data.frame() %>%
rename(id=1) %>%
mutate(network_id = i,
id=as.integer(id))
networks <-bind_rows(n,networks)
}
networks <- networks %>%
inner_join(details) # one way to bring in details
n_weight <- networks %>%
group_by(network_id) %>%
summarise(network_weight=max(weight))
networks <- networks %>%
inner_join(n_weight)
networks # final answer
filtered_n <- networks %>%
filter(network_weight==100) %>%
select(network_id) %>%
distinct()#this brings out just the network ID's of whatever we happen to want
filtered_n <- networks %>%
filter(network_id %in% filtered_n_id$network_id)
edges %>%
filter(from %in% filtered_n$id | to %in% filtered_n$id ) %>%
graph_from_data_frame() %>%
plot() # returns only the network/s that we want to view
Here is a solution just using igraph and base R.
networkid <- components(g)$membership
Table <- aggregate(id, list(networkid), function(x) { max(weight[x]) })
networkcat <- Table$x[networkid]
Final <- data.frame(id, idcat=cat, networkid, networkcat)
Final
id idcat networkid networkcat
1 1 traffic accident 1 50
2 2 abuse 1 50
3 3 abuse 2 50
4 4 speeding 2 50
5 5 murder 3 100
6 6 abuse 3 100
7 7 speeding 3 100
8 8 abuse 3 100