I have code that aims to generate a graph from an adjacency matrix from a table correlating workers with their manager. The source is a table with two columns (Worker, manager). It still works perfectly from a small mock data set, but fails unexpectedly with the real data:
import pandas as pd
import networkx as nx
# Read input
df = pd.read_csv("org.csv")
# Create the input adjacency matrix
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
# or that index and columns don't match
# Fill the matrix
for ix, row in df.iterrows():
am.at[row["manager"], row["Worker"]] = 1
# At this point, am.shape returns a square dataframe (2825,2825)
# Generate the graph
G = nx.from_pandas_adjacency(am, create_using=nx.DiGraph)
This returns: NetworkXError: Adjacency matrix not square: nx,ny=(2825, 2829)
And indeed, the dimensions reported in the error are not the same as in those of the input dataframe am
.
Does anyone have an idea of what happens in from_pandas_adjacency
that could lead to this mismatch?
In:
am = pd.DataFrame(0, columns=df["Worker"], index=df["Worker"])
# This way, it is impossible that the dataframe is not square,
your DataFrame is indeed square, but when you later assign values in the loop, if you have a manager that is not in "Worker", this will create a new row:
am.at[row["manager"], row["Worker"]]
Better avoid the loop, use a crosstab
, then reindex
on the whole set of nodes:
am = pd.crosstab(df['manager'], df['Worker'])
nodes = am.index.union(am.columns)
am = am.reindex(index=nodes, columns=nodes, fill_value=0)
Even better, if you don't really need the adjacency matrix, directly create the graph with nx.from_pandas_edgelist
:
G = nx.from_pandas_edgelist(df, source='manager', target='Worker',
create_using=nx.DiGraph)
Example:
# input
df = pd.DataFrame({'manager': ['A', 'B', 'A'], 'Worker': ['D', 'E', 'F']})
# adjacency matrix
A B D E F
A 0 0 1 0 1
B 0 0 0 1 0
D 0 0 0 0 0
E 0 0 0 0 0
F 0 0 0 0 0
# adjacency matrix with your code
Worker D E F
Worker
D 0.0 0.0 0.0
E 0.0 0.0 0.0
F 0.0 0.0 0.0
A 1.0 NaN 1.0 # those rows are created
B NaN 1.0 NaN # after initializing am
Graph: