I want to obtain the adjacency matrices of a networkx.MultiDiGraph
. My code looks as follows:
import numpy as np
import networkx as nx
np.random.seed(123)
n_samples = 10
uv = [
(1, 2),
(2, 3),
(3, 4),
(4, 5),
(5, 6)
]
G = nx.MultiDiGraph()
for u, v in uv:
weights = np.random.uniform(0, 1, size=n_samples)
G.add_edges_from([(u, v, dict(sample_id=s+1, weight=weights[s])) for s in range(n_samples)])
A = nx.to_numpy_array(G=G, nodelist=list(G.nodes))
As the docs state the default of nx.to_numpy_array()
for this type of graph is to sum the weights of the multiple edges.
Therefore, the output look as follows:
[[0. 5.44199353 0. 0. 0. 0. ]
[0. 0. 4.12783997 0. 0. 0. ]
[0. 0. 0. 5.37945594 0. 0. ]
[0. 0. 0. 0. 4.95418265 0. ]
[0. 0. 0. 0. 0. 5.18942126]
[0. 0. 0. 0. 0. 0. ]]
I would like to obtain 10 adjacency matrices, one for each s
. My desired output should look as follows:
print(A.shape)
>> (6, 6, 10)
Please advice
As indicated in comment, you might want to generate individual DiGraphs instead of a MultiDiGraph.
That said, if you want to export multiple adjacency matrices based on the sample_id
, you could export to pandas DataFrame with to_pandas_edgelist
, then reshape with pivot_table
and split the arrays with groupby
:
nodes = list(G.nodes)
df = (nx.to_pandas_edgelist(G)
.pivot_table(index=['sample_id', 'source'],
columns='target', values='weight')
.reindex(columns=nodes)
)
matrices = {k: g.droplevel(0).reindex(nodes).to_numpy()
for k, g in df.groupby('sample_id')}
Then you'll have a dictionary of {sample_id: adjacency_matrix}
:
matrices[6]
array([[ nan, 0.423106, nan, nan, nan, nan],
[ nan, nan, 0.737995, nan, nan, nan],
[ nan, nan, nan, 0.322959, nan, nan],
[ nan, nan, nan, nan, 0.312261, nan],
[ nan, nan, nan, nan, nan, 0.250455],
[ nan, nan, nan, nan, nan, nan]])
NB. if you want 0s for missing edges, add .fillna(0)
before converting .to_numpy()
.
Alternatively, to get directly a 3D numpy array from the DataFrame, you could complete
/reindex
the missing values:
# pip install janitor
import janitor
nodes = list(G.nodes)
df = nx.to_pandas_edgelist(G)
samples = df['sample_id'].unique()
N = len(nodes)
A = (df.complete({'source': nodes, 'target': nodes, 'sample_id': samples})
['weight'].to_numpy().reshape(N, N, -1)
)
Or:
import pandas as pd
nodes = list(G.nodes)
df = nx.to_pandas_edgelist(G)
samples = df['sample_id'].unique()
N = len(nodes)
A = (df.set_index(['source', 'target', 'sample_id'])
.reindex(pd.MultiIndex.from_product([nodes, nodes, samples]))
['weight'].to_numpy().reshape(N, N, -1)
)
Output of A[:, :, 5]
(6th sample):
array([[ nan, 0.423106, nan, nan, nan, nan],
[ nan, nan, 0.737995, nan, nan, nan],
[ nan, nan, nan, 0.322959, nan, nan],
[ nan, nan, nan, nan, 0.312261, nan],
[ nan, nan, nan, nan, nan, 0.250455],
[ nan, nan, nan, nan, nan, nan]])
I would probably prefer a (10, 6, 6) shape to directly access the samples with A[id]
:
A = (df.complete({'sample_id': samples, 'source': nodes, 'target': nodes})
['weight'].to_numpy().reshape(-1, N, N)
)
# or
A = (df.set_index(['sample_id', 'source', 'target'])
.reindex(pd.MultiIndex.from_product([samples, nodes, nodes]))
['weight'].to_numpy().reshape(-1, N, N)
)