rgraphigraphoutliersgraph-layout

How to control outlier nodes for network layout algorithms?


Presenting large graph (>10000 nodes; > 10000 edges) using igraph package with Fruchterman-Reingold layout algorithm. Some outlier nodes will make the visualization difficult, 99% nodes huddled together, while 1% outlier nodes located far away. For example, 99.9% nodes locate between 0 and 10, but 0.1% nodes locate outside 10000. The problem is how to control these outlier nodes to present the all nodes.

Here is an example, in which the 0.2% outlier nodes make the full presentation difficult.

> library(igraph)
> set.seed(12)
> ig <- erdos.renyi.game(12000,1/10000,directed=TRUE,loops=FALSE)
> ig.layout <- layout_with_fr(ig)
> apply(ig.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
               [,1]         [,2]
0%      -54.7584289   -58.192821
0.1%    -49.8806632   -51.090376
1%      -29.7822097   -33.073435
10%      -0.2196407    -1.170996
90%      10.1564691    10.513665
99%    2026.5245335   737.739440
99.9% 16433.7302032 13168.400710
100%  22614.7986797 22284.309659

Solution

  • One way to "control" the outliers is to get rid of them. This will reduce your initial problem, but you will still be stuck with a big graph that is hard to visualize. But let's deal with one thing at a time. First, the outliers.

    Unfortunately, you set the seed after you generated the graph. I will move the set.seed statement first so that the results will be reproducible.

    library(igraph)
    set.seed(12)
    ig <- erdos.renyi.game(12000,1/10000,directed=TRUE,loops=FALSE)
    ig.layout <- layout_with_fr(ig)
    apply(ig.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
                   [,1]          [,2]
    0%    -5.359639e+01 -9.898871e+01
    0.1%  -4.996891e+01 -5.046219e+01
    1%    -3.040131e+01 -2.934615e+01
    10%   -1.221806e-02  1.513951e-02
    90%    1.207328e+01  1.130579e+01
    99%    1.111746e+03  6.994646e+02
    99.9%  1.418739e+04  1.182382e+04
    100%   1.968552e+04  2.025938e+04
    

    I get a result comparable to yours. More to the point, the graph is badly warped by the outliers.

    plot(ig, layout=ig.layout, vertex.size=4, vertex.label=NA,
        edge.arrow.size=0.4)
    

    Original graph

    But what are these outliers?

    igComp = components(ig)
    table(igComp$csize)
        1     2     3     4     5     6     7 10489 
     1041   137    42     8     5     1     1     1 
    

    Your graph has one very large component and quite a few small components. The "outliers" are the nodes in the small, disconnected components. My suggestion is that if you want to see the graph, eliminate these small components. Just look at the big component.

    C1 = induced_subgraph(ig, which(igComp$membership ==1))
    
    set.seed(12)
    C1.layout <- layout_with_fr(C1)
    apply(C1.layout,2,quantile,c(0,0.001,0.01,0.1,0.9,0.99,0.999,1))
                [,1]        [,2]
    0%    -18.111038 -30.5068075
    0.1%  -11.257167 -14.4507491
    1%     -4.570292  -3.2830470
    10%     0.124789   0.1836629
    90%     7.182714   7.1506193
    99%    12.291679  13.1523646
    99.9%  26.812703  23.6325447
    100%   35.186445  26.8564644
    

    Now the layout is more reasonable.

    plot(C1, layout=C1.layout, vertex.size=4, vertex.label=NA,
        edge.arrow.size=0.4)
    

    Big Component

    Now the "outliers" are gone and we see the core of the graph. You have a different problem now. It is hard to look at 10500 nodes and make sense of it, but at least you can see this core. I wish you luck with taking the exploration further.