pythonpython-3.xpandasgraphnetworkx

Build aggregated network graph from Pandas dataframe containing a column with list of nodes for each row?


I'm dipping my toes into network visualizations in Python. I have a dataframe like the following:

| user | nodes    |
| -----| ---------|
| A    | [0, 1, 3]|
| B    | [1, 2, 4]|
| C    | [0, 3]   |
|...   |          |

Is there a way to easily plot a network graph (NetworkX?) from data that contains the list of nodes on each row? The presence of a node in a row would increase the prominence of that node on the graph (or the prominence/weight of the edge in the relationship between two nodes).

enter image description here

I assume some transformation would be required to get the data into the appropriate format for NetworkX (or similar) to be able to create the graph relationships.

Thanks!


Solution

  • Since you have lists, using pandas would not be more efficient.

    You could use itertools to enumerate the edges, and collections.Counter to count them, then build the graph and plot with a width based on the weight:

    from itertools import combinations, chain
    from collections import Counter
    import networkx as nx
    
    c = Counter(chain.from_iterable(combinations(sorted(l), 2) for l in df['nodes']))
    
    G = nx.Graph()
    G.add_weighted_edges_from((*e, w) for e, w in c.items())
    
    pos = nx.spring_layout(G)
    nx.draw_networkx(G, pos)
    
    for *e, w in G.edges(data='weight'):
        nx.draw_networkx_edges(G, pos, edgelist=[e], width=w)
    

    Output:

    networkx graph from weighted edges

    Used input:

    df = pd.DataFrame({'user': ['A', 'B', 'C'],
                       'nodes': [[0, 1, 3], [1, 2, 4], [0, 3]],
                      })