I am using graph-tool to plot an adjacency matrix from a dataframe. The adjacency matrix looks correct and symmetric, but when I plot it using graph-tool, the resulting visualization is incorrect. (I can look at the dataframe/array to see where there should be connections, but there are none. For example, I know that node 100 should have more than one connection, but it is currently only connected to 99.)
I created a dataframe with about 133 columns, that looks approximately like this:
I used this code to create a symmetric matrix, and it doesn't look like there are any problems there in terms of the matrix. (I think I don't technically need a symmetric matrix if it's undirected, but I thought that was what was causing my issue of missing connections and nodes, initially.)
However, when trying to graph the matrix using this code, borrowing from graph-tool,
new_array = symmetric_adjacency_matrix.to_numpy()
print(new_array)
new_g = Graph(scipy.sparse.lil_matrix(new_array),directed=False)
graph_draw(new_g,vertex_text=new_g.vertex_index)
it creates this image:
Unfortunately, when looking at this, I know that the node 100 should be connected to node 116 as well, along with others. But this is not happening, and I am not sure why. Moreover, it is missing nodes, such as 133. 133 exists in my dataframe, but it is not showing up in the graph.
Is it due to the conversion from a dataframe to an array? I'm not sure what my issue is here, nor how to fix it.
Edit: Reproducible example (I used Google Colab, so that is why the import for graph-tool is like that. Need an exclamation point for shell commands.):
#installed conda
!pip install -q condacolab
import condacolab
condacolab.install()
#installed graph-tool
!mamba install -q graph-tool
#imported from graph-tool
from graph_tool.all import *
#below is pulled from their google colab installation
g = collection.data["celegansneural"]
state = minimize_nested_blockmodel_dl(g)
#imports
import numpy as np
import scipy
import pandas as pd
#dataframe taken from my data and shortened
df = pd.DataFrame({'p1': [1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'p2': [2, 4, 3, 4, 5, 14, 4, 5, 14, 17]})
#creates symmetrical adjacency matrix
def get_adjacency_matrix(df, col1, col2):
df = pd.crosstab(df[col1], df[col2])
idx = df.columns.union(df.index)
df = df.reindex(index = idx, columns=idx, fill_value=0)
return df
a_to_b = get_adjacency_matrix(df, "p1", "p2")
b_to_a = get_adjacency_matrix(df, "p2", "p1")
symmetric_adjacency_matrix = a_to_b + b_to_a
symmetric_adjacency_matrix
#turns matrix into array, graphs it
new_array = symmetric_adjacency_matrix.to_numpy()
print(new_array)
new_g = Graph(scipy.sparse.lil_matrix(new_array),directed=False)
graph_draw(new_g,vertex_text=new_g.vertex_index)
Based on your example, I would suggest something along these lines.
import numpy as np
import scipy
import pandas as pd
from graph_tool.all import *
df = pd.DataFrame({'p1': [1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'p2': [2, 4, 3, 4, 5, 14, 4, 5, 14, 17]})
rows = df['p1']
def get_sparse_symmetric_adjacency_matrix(df, col1, col2):
# Get col1 and col2 as NumPy arrays
rows, cols = df[col1].values, df[col2].values
# Floating point types will not work here
assert pd.api.types.is_integer_dtype(rows), "col1 must have integer type"
assert pd.api.types.is_integer_dtype(cols), "col2 must have integer type"
# Calculate size of matrix. If this is not provided, SciPy will only make
# the matrix large enough to contain the values that are present
num_nodes = max(np.max(rows), np.max(cols)) + 1
# All values that are present in the matrix are set to one
ones = np.ones_like(rows)
# Create matrix
# Use col1 and col2 as index into sparse matrix
sparse = scipy.sparse.coo_matrix((ones, (rows, cols)), shape=(num_nodes, num_nodes))
# Make matrix symmetric
sparse = sparse + sparse.T
# Remove duplicates created by transpose (e.g. if you had both 2->1 and 1->2 in
# original list
sparse = (sparse >= 1).astype('int8')
return sparse
new_sparse_matrix = get_sparse_symmetric_adjacency_matrix(df, 'p1', 'p2')
new_g = Graph(new_sparse_matrix, directed=False)
print(new_sparse_matrix.toarray())
graph_draw(new_g,vertex_text=new_g.vertex_index)
Explanation:
Graph this draws: