python pandas adjacency-matrix graph-tool

graph-tool not plotting/visualizing adjacency matrix correctly

I am using graph-tool to plot an adjacency matrix from a dataframe. The adjacency matrix looks correct and symmetric, but when I plot it using graph-tool, the resulting visualization is incorrect. (I can look at the dataframe/array to see where there should be connections, but there are none. For example, I know that node 100 should have more than one connection, but it is currently only connected to 99.)

I created a dataframe with about 133 columns, that looks approximately like this:

an adjacency matrix in a dataframe

I used this code to create a symmetric matrix, and it doesn't look like there are any problems there in terms of the matrix. (I think I don't technically need a symmetric matrix if it's undirected, but I thought that was what was causing my issue of missing connections and nodes, initially.)

However, when trying to graph the matrix using this code, borrowing from graph-tool,

new_array = symmetric_adjacency_matrix.to_numpy()
print(new_array)
new_g = Graph(scipy.sparse.lil_matrix(new_array),directed=False)
graph_draw(new_g,vertex_text=new_g.vertex_index)

it creates this image:

a graph with connected nodes

Unfortunately, when looking at this, I know that the node 100 should be connected to node 116 as well, along with others. But this is not happening, and I am not sure why. Moreover, it is missing nodes, such as 133. 133 exists in my dataframe, but it is not showing up in the graph.

Is it due to the conversion from a dataframe to an array? I'm not sure what my issue is here, nor how to fix it.

Edit: Reproducible example (I used Google Colab, so that is why the import for graph-tool is like that. Need an exclamation point for shell commands.):

#installed conda
!pip install -q condacolab
import condacolab
condacolab.install()

#installed graph-tool
!mamba install -q graph-tool

#imported from graph-tool
from graph_tool.all import *

#below is pulled from their google colab installation 
g = collection.data["celegansneural"]
state = minimize_nested_blockmodel_dl(g)

#imports
import numpy as np
import scipy
import pandas as pd

#dataframe taken from my data and shortened
df = pd.DataFrame({'p1': [1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 
                   'p2': [2, 4, 3, 4, 5, 14, 4, 5, 14, 17]})

#creates symmetrical adjacency matrix
def get_adjacency_matrix(df, col1, col2):
    df = pd.crosstab(df[col1], df[col2])
    idx = df.columns.union(df.index)
    df = df.reindex(index = idx, columns=idx, fill_value=0)
    return df

a_to_b = get_adjacency_matrix(df, "p1", "p2")
b_to_a = get_adjacency_matrix(df, "p2", "p1")

symmetric_adjacency_matrix = a_to_b + b_to_a
symmetric_adjacency_matrix

#turns matrix into array, graphs it
new_array = symmetric_adjacency_matrix.to_numpy()
print(new_array)
new_g = Graph(scipy.sparse.lil_matrix(new_array),directed=False)
graph_draw(new_g,vertex_text=new_g.vertex_index)

Solution

Based on your example, I would suggest something along these lines.

import numpy as np
import scipy
import pandas as pd
from graph_tool.all import *

df = pd.DataFrame({'p1': [1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 
                   'p2': [2, 4, 3, 4, 5, 14, 4, 5, 14, 17]})
rows = df['p1']

def get_sparse_symmetric_adjacency_matrix(df, col1, col2):
    # Get col1 and col2 as NumPy arrays
    rows, cols = df[col1].values, df[col2].values
    # Floating point types will not work here
    assert pd.api.types.is_integer_dtype(rows), "col1 must have integer type"
    assert pd.api.types.is_integer_dtype(cols), "col2 must have integer type"
    # Calculate size of matrix. If this is not provided, SciPy will only make
    # the matrix large enough to contain the values that are present
    num_nodes = max(np.max(rows), np.max(cols)) + 1
    # All values that are present in the matrix are set to one
    ones = np.ones_like(rows)
    # Create matrix
    # Use col1 and col2 as index into sparse matrix
    sparse = scipy.sparse.coo_matrix((ones, (rows, cols)), shape=(num_nodes, num_nodes))
    # Make matrix symmetric
    sparse = sparse + sparse.T
    # Remove duplicates created by transpose (e.g. if you had both 2->1 and 1->2 in
    # original list
    sparse = (sparse >= 1).astype('int8')
    return sparse


new_sparse_matrix = get_sparse_symmetric_adjacency_matrix(df, 'p1', 'p2')
new_g = Graph(new_sparse_matrix, directed=False)
print(new_sparse_matrix.toarray())
graph_draw(new_g,vertex_text=new_g.vertex_index)

Explanation:

An edge list graph representation is nearly equivalent to the way that a COO adjacency matrix is represented. The COO matrix is represented by the coordinates within the matrix of each data value, and the data value. This makes it very fast to create.
We need to make the matrix symmetric. I do this by taking the transpose of the matrix. Technically, this step is not required for non-directed graphs if you can assure that the element in the p1 column is always less than the element in the p2 column.
I also remove duplicates. Arguably not necessary if you don't have duplicate edges in your original edge list.

Graph this draws: