pythonnetworkx

Get adjacency matrices of networkx.MultiDiGraph


I want to obtain the adjacency matrices of a networkx.MultiDiGraph. My code looks as follows:

import numpy as np
import networkx as nx
np.random.seed(123)

n_samples = 10


uv = [
    (1, 2),
    (2, 3),
    (3, 4),
    (4, 5),
    (5, 6)
]


G = nx.MultiDiGraph()

for u, v in uv:
    weights = np.random.uniform(0, 1, size=n_samples)
    G.add_edges_from([(u, v, dict(sample_id=s+1, weight=weights[s])) for s in range(n_samples)])

A = nx.to_numpy_array(G=G, nodelist=list(G.nodes))

As the docs state the default of nx.to_numpy_array() for this type of graph is to sum the weights of the multiple edges. Therefore, the output look as follows:

[[0.         5.44199353 0.         0.         0.         0.        ]
 [0.         0.         4.12783997 0.         0.         0.        ]
 [0.         0.         0.         5.37945594 0.         0.        ]
 [0.         0.         0.         0.         4.95418265 0.        ]
 [0.         0.         0.         0.         0.         5.18942126]
 [0.         0.         0.         0.         0.         0.        ]]

I would like to obtain 10 adjacency matrices, one for each s. My desired output should look as follows:

print(A.shape)
>> (6, 6, 10)

Please advice


Solution

  • As indicated in comment, you might want to generate individual DiGraphs instead of a MultiDiGraph.

    That said, if you want to export multiple adjacency matrices based on the sample_id, you could export to pandas DataFrame with to_pandas_edgelist, then reshape with pivot_table and split the arrays with groupby:

    nodes = list(G.nodes)
    
    df = (nx.to_pandas_edgelist(G)
            .pivot_table(index=['sample_id', 'source'],
                         columns='target', values='weight')
            .reindex(columns=nodes)
         )
    
    matrices = {k: g.droplevel(0).reindex(nodes).to_numpy()
                for k, g in df.groupby('sample_id')}
    

    Then you'll have a dictionary of {sample_id: adjacency_matrix}:

    matrices[6]
    
    array([[     nan, 0.423106,      nan,      nan,      nan,      nan],
           [     nan,      nan, 0.737995,      nan,      nan,      nan],
           [     nan,      nan,      nan, 0.322959,      nan,      nan],
           [     nan,      nan,      nan,      nan, 0.312261,      nan],
           [     nan,      nan,      nan,      nan,      nan, 0.250455],
           [     nan,      nan,      nan,      nan,      nan,      nan]])
    

    NB. if you want 0s for missing edges, add .fillna(0) before converting .to_numpy() .

    Alternatively, to get directly a 3D numpy array from the DataFrame, you could complete/reindex the missing values:

    # pip install janitor
    import janitor
    
    nodes = list(G.nodes)
    df = nx.to_pandas_edgelist(G)
    samples = df['sample_id'].unique()
    N = len(nodes)
    
    A = (df.complete({'source': nodes, 'target': nodes, 'sample_id': samples})
           ['weight'].to_numpy().reshape(N, N, -1)
        )
    

    Or:

    import pandas as pd
    
    nodes = list(G.nodes)
    df = nx.to_pandas_edgelist(G)
    samples = df['sample_id'].unique()
    N = len(nodes)
    
    A = (df.set_index(['source', 'target', 'sample_id'])
           .reindex(pd.MultiIndex.from_product([nodes, nodes, samples]))
           ['weight'].to_numpy().reshape(N, N, -1)
        )
    

    Output of A[:, :, 5] (6th sample):

    array([[     nan, 0.423106,      nan,      nan,      nan,      nan],
           [     nan,      nan, 0.737995,      nan,      nan,      nan],
           [     nan,      nan,      nan, 0.322959,      nan,      nan],
           [     nan,      nan,      nan,      nan, 0.312261,      nan],
           [     nan,      nan,      nan,      nan,      nan, 0.250455],
           [     nan,      nan,      nan,      nan,      nan,      nan]])
    

    I would probably prefer a (10, 6, 6) shape to directly access the samples with A[id]:

    A = (df.complete({'sample_id': samples, 'source': nodes, 'target': nodes})
           ['weight'].to_numpy().reshape(-1, N, N)
        )
    
    # or
    A = (df.set_index(['sample_id', 'source', 'target'])
           .reindex(pd.MultiIndex.from_product([samples, nodes, nodes]))
           ['weight'].to_numpy().reshape(-1, N, N)
        )