rgraphdata-visualizationigraphgraph-visualization

R: How to Efficiently Visualize a Large Graph Network


I simulated some graph network data (~10,000 observations) in R and tried to visualize it using the visNetwork library in R. However, the data is very cluttered and is very difficult to analyze visually (I understand that in real life, network data is meant to be analyzed using graph query language).

For the time being, is there anything I can do to improve the visualization of the graph network I created (so I can explore some of the linkages and nodes that are all piled on top of each other)?

Can libraries such as 'networkD3' and 'diagrammeR' be used to better visualize this network?

I have attached my reproducible code below:

library(igraph)
library(dplyr)
library(visNetwork)

#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)

#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)

#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")

graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
graph

plot(graph)

library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]

visNetwork(nodes, edges) %>%   visIgraphLayout(layout = "layout_with_fr") %>%
    visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% 
    visInteraction(navigationButtons = TRUE)

Thanks


Solution

  • At the request of the OP, I am applying the method used in a previous answer Visualizing the result of dividing the network into communities to this problem.

    The network in the question was not created with a specified random seed. Here, I specify the seed for reproducibility.

    ## reproducible version of OP's network
    library(igraph)
    library(dplyr)
    
    set.seed(1234)
    #create file from which to sample from
    x5 <- sample(1:10000, 10000, replace=T)
    #convert to data frame
    x5 = as.data.frame(x5)
    
    #create first file (take a random sample from the created file)
    a = sample_n(x5, 9000)
    #create second file (take a random sample from the created file)
    b = sample_n(x5, 9000)
    
    #combine
    c = cbind(a,b)
    #create dataframe
    c = data.frame(c)
    #rename column names
    colnames(c) <- c("a","b")
    
    graph <- graph.data.frame(c, directed=F)
    graph <- simplify(graph)
    

    As noted by the OP, a simple plot is a mess. The referenced previous answer broke this into two parts:

    1. Plot all of the small components
    2. Plot the giant component

    1. Small components Different components get different colors to help separate them.

    ## Visualize the small components separately
    SmallV = which(components(graph)$membership != 1)
    SmallComp = induced_subgraph(graph, SmallV)
    LO_SC = layout_components(SmallComp, layout=layout_with_graphopt)
    plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8, 
        vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]])
    

    The Small components

    More could be done with this, but that is fairly easy and not the substance of the question, so I will leave this as the representation of the small components.

    2. Giant component
    Simply plotting the giant component is still hard to read. Here are two approaches to improving the display. Both rely on grouping the vertices. For this answer, I will use cluster_louvain to group the nodes, but you could try other community detection methods. cluster_louvain produces 47 communities.

    ## Now try for the giant component
    GiantV = which(components(graph)$membership == 1)
    GiantComp = induced_subgraph(graph, GiantV)
    GC_CL = cluster_louvain(GiantComp)
    max(GC_CL$membership)
    [1] 47
    

    Giant method 1 - grouped vertices
    Create a layout that emphasizes the communities

    GC_Grouped = GiantComp
    E(GC_Grouped)$weight = 1
    for(i in unique(membership(GC_CL))) {
        GroupV = which(membership(GC_CL) == i)
        GC_Grouped = add_edges(GC_Grouped, combn(GroupV, 2), attr=list(weight=6))
    } 
    
    set.seed(1234)
    LO = layout_with_fr(GC_Grouped)
    colors <- rainbow(max(membership(GC_CL)))
    par(mar=c(0,0,0,0))
    plot(GC_CL, GiantComp, layout=LO,
        vertex.size = 5, 
        vertex.color=colors[membership(GC_CL)], 
        vertex.label = NA, edge.width = 1)
    

    Giant component with grouped vertices

    This provides some insight, but the many edges make it a bit hard to read.

    Giant method 2 - contracted communities
    Plot each community as a single vertex. The size of the vertex reflects the number of nodes in that community. The color represents the degree of the community node.

    ## Contract the communities in the giant component
    CL.Comm = simplify(contract(GiantComp, membership(GC_CL)))
    D = unname(degree(CL.Comm))
    
    set.seed(1234)
    par(mar=c(0,0,0,0))
    plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)),
        vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8,
        vertex.color=round((D-29)/4)+1)
    

    Giant component with contracted communities

    This is much cleaner, but loses any internal structure of the communities.