rcategoriesigraphsnatidygraph

Define and categorise separate networks in R


I have an issue that I've been unable to optimise and I'm sure that either igraph or tidy graphs must hold this function already or there must be a better way to do this. I am using R and igraph to do this but possibly tidygraphs would also do the job.

Problem: How to define networks a list of over two million edges (node 1 - linked to - node 2) into their own separate networks and to then define the network as it's highest weighted node category.

Data:

Edges:

from to
1 2
3 4
5 6
7 6
8 6

This creates 3 networks N.B. in the real example we have loops and multiple edges to and from nodes (this is why I've used igraph as it easily deals with these).

Network

Data: Node categories:

id cat weight
1 traffic accident 10
2 abuse 50
3 abuse 50
4 speeding 5
5 murder 100
6 abuse 50
7 speeding 5
8 abuse 50

Final table: The final table categorises each node and labels each network with the max category of the nodes

id idcat networkid networkcat
1 traffic accident 1 50
2 abuse 1 50
3 abuse 2 50
4 speeding 2 50
5 murder 3 100
6 abuse 3 100
7 speeding 3 100
8 abuse 3 100

Current iterative solution and code: If there is no better solution to this then maybe we can speed this iteration up?

library(tidyverse)
library(igraph)
library(purrr) #might be an answer
library(tidyverse)
library(tidygraph) #might be an answer

from <- c(1,3,5,7,8)
to <- c(2,4,6,6,6)
edges <- data.frame(from,to)

id <- c(1,2,3,4,5,6,7,8)
cat <- c("traffic accident","abuse","abuse","speeding","murder","abuse","speeding","abuse")
weight <- c(10,50,50,5,100,50,5,50)

details <- data.frame(id,cat,weight) 

g <- graph_from_data_frame(edges)# we can add the vertex details here as well g <- 
graph_from_data_frame(edges,vertices=details) but we join these in later
plot(g)

dg <- decompose(g)# decomposing the network defines the separate networks 

networks <- data.frame(id=as.integer(),
                   network_id=as.integer())

for (i in 1:length(dg)) { # this is likely too many to do at once. As the networks are already defined we can split this into chunks. There is a case here for parellisation
  n <- dg[[i]][1] %>% # using the decomposed list of lists from i graph. There is an issue here as the list comes back with the node as an index. I can't find an easier way to get this out
    as.data.frame() %>% # I can't work a way to bring out the data without changing to df and then using row names
    row.names() %>% # and this returns a vector
    as.data.frame() %>% 
    rename(id=1) %>% 
    mutate(network_id = i,
           id=as.integer(id))

  networks <-bind_rows(n,networks)
}  

networks <- networks %>% 
  inner_join(details) # one way to bring in details

n_weight <- networks %>%
  group_by(network_id) %>% 
  summarise(network_weight=max(weight))

networks <- networks %>% 
  inner_join(n_weight)

networks # final answer

filtered_n <- networks %>% 
  filter(network_weight==100) %>% 
  select(network_id) %>% 
  distinct()#this brings out just the network ID's of whatever we happen to want

filtered_n <- networks %>% 
  filter(network_id %in% filtered_n_id$network_id)

edges %>% 
  filter(from %in% filtered_n$id | to %in% filtered_n$id ) %>% 
  graph_from_data_frame() %>% 
  plot() # returns only the network/s that we want to view

Solution

  • Here is a solution just using igraph and base R.

    networkid <- components(g)$membership
    Table <- aggregate(id, list(networkid),  function(x) { max(weight[x]) })
    networkcat <-  Table$x[networkid]
    Final <- data.frame(id, idcat=cat, networkid, networkcat)
    
    Final
      id            idcat networkid networkcat
    1  1 traffic accident         1         50
    2  2            abuse         1         50
    3  3            abuse         2         50
    4  4         speeding         2         50
    5  5           murder         3        100
    6  6            abuse         3        100
    7  7         speeding         3        100
    8  8            abuse         3        100