python pandas dataframe networkx graph-theory

Adjacency matrix not square error from square dataframe with networkx

I have code that aims to generate a graph from an adjacency matrix from a table correlating workers with their manager. The source is a table with two columns (Worker, manager). It still works perfectly from a small mock data set, but fails unexpectedly with the real data:

import pandas as pd
import networkx as nx

# Read input
df = pd.read_csv("org.csv")

# Create the input adjacency matrix
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
# or that index and columns don't match

# Fill the matrix
for ix, row in df.iterrows():
    am.at[row["manager"], row["Worker"]] = 1

# At this point, am.shape returns a square dataframe (2825,2825)
# Generate the graph
G = nx.from_pandas_adjacency(am, create_using=nx.DiGraph)

This returns: NetworkXError: Adjacency matrix not square: nx,ny=(2825, 2829)

And indeed, the dimensions reported in the error are not the same as in those of the input dataframe am.

Does anyone have an idea of what happens in from_pandas_adjacency that could lead to this mismatch?

Solution

In:

am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,

your DataFrame is indeed square, but when you later assign values in the loop, if you have a manager that is not in "Worker", this will create a new row:

am.at[row["manager"], row["Worker"]]

Better avoid the loop, use a crosstab, then reindex on the whole set of nodes:

am = pd.crosstab(df['manager'], df['Worker'])
nodes = am.index.union(am.columns)
am = am.reindex(index=nodes, columns=nodes, fill_value=0)

Even better, if you don't really need the adjacency matrix, directly create the graph with nx.from_pandas_edgelist:

G = nx.from_pandas_edgelist(df, source='manager', target='Worker',
                            create_using=nx.DiGraph)

Example:

# input
df = pd.DataFrame({'manager': ['A', 'B', 'A'], 'Worker': ['D', 'E', 'F']})

# adjacency matrix
   A  B  D  E  F
A  0  0  1  0  1
B  0  0  0  1  0
D  0  0  0  0  0
E  0  0  0  0  0
F  0  0  0  0  0

# adjacency matrix with your code
Worker    D    E    F
Worker               
D       0.0  0.0  0.0
E       0.0  0.0  0.0
F       0.0  0.0  0.0
A       1.0  NaN  1.0  # those rows are created 
B       NaN  1.0  NaN  # after initializing am

Graph: