pythonpandasmatplotlibchartsnetworkx# Visualize Nodes and Their Connections in Clusters via networkx

#### Some recreation of the dataset

### Transparency

## Use a larger k value

## Adding Jitter

I have a list of Connections between two nodes describing similarities of Entries in a Dataset.

I'm thinking of vizualising the Entries and their connections to show that there are clusters of very similar entries.

Each tuple stands for a pair of very similar nodes. I've chosen weight as 1 for all of them since it's required but I want all edges equally thick.

I've started with networkx, problem is I don't really now how to cluster the similar nodes together in a useful manner.

I have a List of the connections in a Dataframe:

```
smallSample =
[[0, 1492, 1],
[12, 937, 1],
[16, 989, 1],
[18, 371, 1],
[18, 1140, 1],
[26, 398, 1],
[26, 1061, 1],
[30, 1823, 1],
[33, 1637, 1],
[54, 1047, 1],
[63, 565, 1]]
```

I Create a graph the following way:

```
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
for index, row in CC.iterrows():
G.add_edge(CC['source'].loc[index],CC['target'].loc[index], weight =1)
pos = nx.spring_layout(G, seed=7)
nx.draw_networkx_nodes(G, pos, node_size=5)
nx.draw_networkx_edges(G, pos, edgelist=G.edges(), width=0.5)
pos = nx.spring_layout(G, k=1, iterations=200)
plt.figure(3, figsize=(2000,2000), dpi =2)
```

With the small sample provided above the result looks like this:

The result from my real df which consists of thousands of points:

How can I Group the linked nodes together so that it is better visible how many of them are in each cluster? I dont want them to overlap so hard, its really not that easy to grasp how many of them are there specially in the big sample.

Solution

From an InfoVis perspective there are a few things you can do

- transparency & node size

Transparency can be used to visualize overlapping. You have to choose between these two tradeoffs: A lower transparency level allows you to visualize more layers, for that many nodes need to overlap and you should increase the node size. However, a larger node size makes individual nodes stick out less and the visualization of node edges adds clutter (disable or use less tick edges).

TL;DR: Choose/Play between smaller node size and high alpha values vs. larger node sizes and lower alpha values. - play with the
`k`

parameter for`nx.spring_layout`

, the larger it is the further away are the nodes. The default is`1/sqrt(len(G))`

a slight increase`[1.2-1.7]/sqrt(len(G))`

can give you some more clarity.

Last but not least I would suggest jitter for you that shuffles the position of nodes a bit and lessens overlap (there are many papers on jitter and some better versions than just uniform that I choose here, however it is the most simplest to implement.)

This code creates a similar looking dataset

```
import random
import numpy as np
import pandas as pd
from copy import deepcopy
import networkx as nx
import matplotlib.pyplot as plt
from math import sqrt
random.seed(7)
np.random.seed(7)
# Create a bigger dataset
smallSample = [
[0, 1492, 1],
[12, 937, 1],
[16, 989, 1],
[18, 371, 1],
[18, 1140, 1],
[26, 398, 1],
[26, 1061, 1],
[30, 1823, 1],
[33, 1637, 1],
[54, 1047, 1],
[63, 565, 1]]
sample = deepcopy(smallSample)
AMOUT = 4000
present_nodes = list(set(x for edge in sample for x in edge))
i = 2
while i < AMOUT:
source = target = None
while source == target:
if random.random() < 0.9:
# Create at least one new node
source = i
if random.random() < 0.7: # High value for many small clusters
# Create a second new node
target = i = i+1
present_nodes.append(target)
else:
target = random.choice(present_nodes)
present_nodes.append(source)
else: # Link existing ones
source = random.choice(present_nodes)
target = random.choice(present_nodes)
i += 1
sample.append([source, target, 1])
CC = pd.DataFrame(sample, columns=["source", "target", "weight"], dtype=int)
# Create the Graph
G = nx.Graph()
for index, row in CC.iterrows():
G.add_edge(CC['source'].loc[index],CC['target'].loc[index], weight =1)
```

**Calcualte Positions**

```
# Defaul k = 1/sqrt(len(G))
pos = nx.spring_layout(G, k=1/sqrt(len(G)), seed=7, iterations=100)
# cast the pos dict to an np.array
apos = np.fromiter(pos.values(), dtype=np.dtype((float, 2)))
```

**Default Look**

```
nx.draw_networkx_nodes(G, pos, node_size=10, alpha=0.45, linewidths=0.2)
nx.draw_networkx_edges(G, pos, edgelist=G.edges(), width=0.5, alpha=0.2)
plt.title("Transparency")
plt.figure(3, figsize=(2000,2000), dpi =2)
```

This increases the distances between the nodes and makes it less clumpy

```
pos15 = nx.spring_layout(G, k=1.5/sqrt(len(G)), seed=7, iterations=100) # Larger k to make it less clumpy
# cast the pos dict to an np.array
apos15 = np.fromiter(pos15.values(), dtype=np.dtype((float, 2)))
nx.draw_networkx_nodes(G, pos15, node_size=10, alpha=0.55, linewidths=0.2)
nx.draw_networkx_edges(G, pos15, edgelist=G.edges(), width=0.5, alpha=0.2)
plt.title("Larger k")
plt.figure(3, figsize=(2000,2000), dpi =2)
```

```
JITTER = 0.025
jitter = np.random.uniform(low=-JITTER, high=JITTER, size=apos.shape)
jpos = {k:p for k,p in zip(pos.keys(), apos + jitter)}
jpos15 = {k:p for k,p in zip(pos15.keys(), apos15 + jitter)}
nx.draw_networkx_nodes(G, jpos, node_size=10, alpha=0.45, linewidths=0.2)
nx.draw_networkx_edges(G, jpos, edgelist=G.edges(), width=0.5, alpha=0.2)
plt.title("default + jitter")
plt.figure(3, figsize=(2000,2000), dpi =2)
plt.show()
nx.draw_networkx_nodes(G, jpos15, node_size=10, alpha=0.55, linewidths=0.2) # As nodes overlapp less I would increase the alpha level a bit
nx.draw_networkx_edges(G, jpos15, edgelist=G.edges(), width=0.5, alpha=0.2)
plt.title("larger k + jitter")
plt.figure(3, figsize=(2000,2000), dpi =2)
```

In the end it is some playing around with the parameter to choose something you like.

- AttributeError: install_layout when attempting to install a package in a virtual environment
- Python list comprehension - want to avoid repeated evaluation
- Hash algorithm for dynamic growing/streaming data?
- matplotlib - making labels for violin plots
- Python How to I check if last element has been reached in iterator tool chain?
- Polars and the Lazy API: How to drop columns that contain only null values?
- Why are my Mean, Var, and Std outputs from NumPy different from what the online grader expects?
- Correlation dataframe convertion from results from pl.corr
- Polars DataFrame transformation
- Discord rate limiting while only sending 1 request per minute
- Check if column contains (/,-,_, *or~) and split in another column - Pandas
- How to draw a rectangle at (x,y) in a PyQt GraphicsView?
- how to calculate correlation between ten columns with polars
- How to set class attribute with await in __init__
- Detect hindi encoding, response received from Facebook API in Python
- Is it possible to write a horizontal if statement with a multi-line body?
- Max length of items in list
- Cannot subclass multiprocessing Queue in Python 3.5
- How can I get notified of updates to Python packages in a unified way?
- Using python AST to traverse code and extract return statements
- merge groups of columns in a polars dataframe to single columns
- Group Pandas DataFrame by Continuous Date Ranges
- Flask login @login_required not working
- Odoo: one2many and many2one? KeyError:'___'
- merge some columns in a Polars dataframe and duplicate the others
- Python: Create table from string mixed with separators using FOR loops
- How do I type hint a method with the type of the enclosing class?
- How can I verify an emails DKIM signature in Python?
- Writing a class that accepts a callback in Python?
- Python Paramiko channel.exec_command not returning output intermittently