
Visualize Nodes and Their Connections in Clusters via networkx

I have a list of Connections between two nodes describing similarities of Entries in a Dataset.

I'm thinking of vizualising the Entries and their connections to show that there are clusters of very similar entries.

Each tuple stands for a pair of very similar nodes. I've chosen weight as 1 for all of them since it's required but I want all edges equally thick.

I've started with networkx, problem is I don't really now how to cluster the similar nodes together in a useful manner.

I have a List of the connections in a Dataframe:

smallSample = 
[[0, 1492, 1],
 [12, 937, 1],
 [16, 989, 1],
 [18, 371, 1],
 [18, 1140, 1],
 [26, 398, 1],
 [26, 1061, 1],
 [30, 1823, 1],
 [33, 1637, 1],
 [54, 1047, 1],
 [63, 565, 1]]

I Create a graph the following way:

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
for index, row in CC.iterrows():
      G.add_edge(CC['source'].loc[index],CC['target'].loc[index], weight =1)
pos = nx.spring_layout(G, seed=7)
nx.draw_networkx_nodes(G, pos, node_size=5)
nx.draw_networkx_edges(G, pos, edgelist=G.edges(), width=0.5)
pos = nx.spring_layout(G, k=1, iterations=200)
plt.figure(3, figsize=(2000,2000), dpi =2) 

With the small sample provided above the result looks like this:

Small Sample

The result from my real df which consists of thousands of points:

Big Sample

How can I Group the linked nodes together so that it is better visible how many of them are in each cluster? I dont want them to overlap so hard, its really not that easy to grasp how many of them are there specially in the big sample.


  • From an InfoVis perspective there are a few things you can do

    Last but not least I would suggest jitter for you that shuffles the position of nodes a bit and lessens overlap (there are many papers on jitter and some better versions than just uniform that I choose here, however it is the most simplest to implement.)

    Some recreation of the dataset

    This code creates a similar looking dataset

    import random
    import numpy as np
    import pandas as pd
    from copy import deepcopy
    import networkx as nx
    import matplotlib.pyplot as plt
    from math import sqrt
    # Create a bigger dataset
    smallSample = [
     [0, 1492, 1],
     [12, 937, 1],
     [16, 989, 1],
     [18, 371, 1],
     [18, 1140, 1],
     [26, 398, 1],
     [26, 1061, 1],
     [30, 1823, 1],
     [33, 1637, 1],
     [54, 1047, 1],
     [63, 565, 1]]
    sample = deepcopy(smallSample)
    AMOUT = 4000
    present_nodes = list(set(x for edge in sample for x in edge))
    i = 2
    while i < AMOUT:
        source = target = None
        while source == target:
            if random.random() < 0.9:
                # Create at least one new node
                source = i
                if random.random() < 0.7: # High value for many small clusters
                    # Create a second new node
                    target = i = i+1
                    target = random.choice(present_nodes)
            else: # Link existing ones
                source = random.choice(present_nodes)
                target = random.choice(present_nodes)
        i += 1
        sample.append([source, target, 1])
    CC = pd.DataFrame(sample, columns=["source", "target", "weight"], dtype=int)
    # Create the Graph
    G = nx.Graph()
    for index, row in CC.iterrows():
          G.add_edge(CC['source'].loc[index],CC['target'].loc[index], weight =1)

    Calcualte Positions

    # Defaul k = 1/sqrt(len(G))
    pos = nx.spring_layout(G, k=1/sqrt(len(G)), seed=7, iterations=100)
    # cast the pos dict to an np.array
    apos = np.fromiter(pos.values(), dtype=np.dtype((float, 2)))

    Default Look



    nx.draw_networkx_nodes(G, pos, node_size=10, alpha=0.45, linewidths=0.2)
    nx.draw_networkx_edges(G, pos, edgelist=G.edges(), width=0.5, alpha=0.2)
    plt.figure(3, figsize=(2000,2000), dpi =2) 

    enter image description here

    Use a larger k value

    This increases the distances between the nodes and makes it less clumpy

    pos15 = nx.spring_layout(G, k=1.5/sqrt(len(G)), seed=7, iterations=100) # Larger k to make it less clumpy
    # cast the pos dict to an np.array
    apos15 = np.fromiter(pos15.values(), dtype=np.dtype((float, 2)))
    nx.draw_networkx_nodes(G, pos15, node_size=10, alpha=0.55, linewidths=0.2)
    nx.draw_networkx_edges(G, pos15, edgelist=G.edges(), width=0.5, alpha=0.2)
    plt.title("Larger k")
    plt.figure(3, figsize=(2000,2000), dpi =2) 

    enter image description here

    Adding Jitter

    JITTER = 0.025
    jitter = np.random.uniform(low=-JITTER, high=JITTER, size=apos.shape)
    jpos = {k:p for k,p in zip(pos.keys(), apos + jitter)}
    jpos15 = {k:p for k,p in zip(pos15.keys(), apos15 + jitter)}
    nx.draw_networkx_nodes(G, jpos, node_size=10, alpha=0.45, linewidths=0.2)
    nx.draw_networkx_edges(G, jpos, edgelist=G.edges(), width=0.5, alpha=0.2)
    plt.title("default + jitter")
    plt.figure(3, figsize=(2000,2000), dpi =2)
    nx.draw_networkx_nodes(G, jpos15, node_size=10, alpha=0.55, linewidths=0.2)  # As nodes overlapp less I would increase the alpha level a bit
    nx.draw_networkx_edges(G, jpos15, edgelist=G.edges(), width=0.5, alpha=0.2)
    plt.title("larger k + jitter")
    plt.figure(3, figsize=(2000,2000), dpi =2)

    adding jitter larger k + jitter

    In the end it is some playing around with the parameter to choose something you like.