twitterneo4j

Clarification on order and size of network


In social graphs is it common to have the number of nodes to be way less than the number of edges?

In my analysis of a twitter network I got results like this

nodes = 20,000

edges = 335,000

How can I interpret this huge gap between the numbers?


Solution

  • Yes, it's a common property of graphs, as the number of potential relationships between nodes increases at a rate proportional to the square of the number of nodes (exact formula below). Take a look how interconnections work between groups as the group scales.

    While we could actually create the nodes, we can simulate this by just looking at the count of all possible combinations that could produce valid non-redundant relationships, and show the count when the set is maximally connected.

    WITH range(1,100) as id
    UNWIND id as a
    UNWIND id as b
    WITH a, b
    WHERE a < b
    RETURN count(*)
    

    If they're all linked, without redundant relationships we end up with 4950 relationships from 100 maximally linked individuals. For 1000 people maximally connected you would have 499500 relationships. For 10000 you would have a little short of 49995000 relationships.

    There's a formula to capture this, the number of edges possible for a complete graph, and it's even more simple to apply it than our prior query:

    WITH 100 as n
    RETURN (n * (n - 1)) / 2.0
    

    Social networks are all about the myriad of connections between individuals, and as you can see as the number of nodes increase, the possible relationship count between them can skyrocket, even if they don't approach a complete graph.

    You can also consider that within a social graph, there will likely be quite a few clusters of friends where each cluster might be maximally connected, and that will push up the relationship count, moreso with the size of the cluster.