pythonpandascluster-analysisadjacency-matrixnode-centrality

Degree Centrality and Clustering Coefficient in Adjacent matrix


Based on a dataset extracted from this link: Brain and Cosmic Web samples, I'm trying to do some Complex Network analysis.


The paper The Quantitative Comparison Between the Neuronal Network and the Cosmic Web, claims to have used this dataset, as well as its adjacent matrixes

"Mij, i.e., a matrix with rows/columns equal to the number of detected nodes, with value Mij = 1 if the nodes are separated by a distance ≤ llink , or Mij = 0 otherwise".

I then probed into the matrix, like so:

from astropy.io import fits

with fits.open('mind_dataset/matrix_CEREBELLUM_large.fits') as data:
    matrix_cerebellum = pd.DataFrame(data[0].data)

which does not print a sparse matrix, but rather a matrix with distances from nodes expressed as pixels.


I've learned that the correspondence between 1 pixel and scale is:

neuronal_web_pixel = 0.32 # micrometers

And came up with a method in order to convert pixels to microns:

def pixels_to_scale(df, mind=False, cosmos=False):
    
    one_pixel_equals_parsec = cosmic_web_pixel
    one_pixel_equals_micron = neuronal_web_pixel
    
    if mind:
        df = df/one_pixel_equals_micron
        
    if cosmos:
        df = df/one_pixel_equals_parsec
        
    return df

Then, another method to binaryze the matrix after the conversion:

def binarize_matrix(df, mind=False, cosmos=False):
    
    if mind:
        brain_Llink = 16.0 # microns
        # distances less than 16 microns
        brain_mask = (df<=brain_Llink)
        # convert to 1
        df = df.where(brain_mask, 1.0)
        
    if cosmos:
        cosmos_Llink = 1.2 # 1.2 mpc
        brain_mask = (df<=cosmos_Llink)
        df = df.where(brain_mask, 1.0)
        
    return df

Finally, with:

matrix_cerebellum = pixels_to_scale(matrix_cerebellum, mind=True)
matrix_cerebellum = binarize_matrix(matrix_cerebellum, mind=True)

matrix_cerebellum.head(5) prints my sparse matrix of (mostly) 0.0s and 1.0s:

0   1   2   3   4   5   6   7   8   9   ... 1848    1849    1850    1851    1852    1853    1854    1855    1856    1857
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 rows × 1858 columns

Now I would like to calculate:

  1. Degree Centrality of the network, given by the formula:

    Cd(j) = Kj / n-1

Where kj is the number of (undirected) connections to/from each j-node and n is the total number of nodes in the entire network.

  1. Clustering Coefficient, which quantifies the existence of infrastructure within the local vicinity of nodes, given by the formula:

    C(j) = 2yi / Kj(Kj -1)

in which yj is the number of links between neighbooring nodes of the j-node.


For finding Degree Centrality, I have tried:

# find connections by adding matrix row values
matrix_cerebellum['K'] = matrix_cerebellum.sum(axis=1)
# applying formula
matrix_cerebellum['centrality'] = matrix_cerebellum['K']/matrix_cerebellum.shape[0]-1

Generates:

... K    centrality
    9.0   -0.995156
    6.0   -0.996771
    7.0   -0.996771
    11.0  -0.996233
    11.0  -0.994080

According to the paper, I should be finding:

"For the cerebellum slices we measured 〈k〉 ∼ 1.9 − 3.7",

For the average numbers of connections per node.

Also I'm finding negative centralities.


Does anyone know how to apply any of these formulas based on the dataframe above?


Solution

  • The webpage with the data sources states that the adjacent matrix files for brain samples give distances between connected nodes expressed in pixels of the images used to reconstruct the networks. The paper then explains that to get the real adjacency matrix Mij (with 0 and 1 values only) the authors consider as connected nodes where the distance is at most 16 micrometers. I don't see the information on how many pixels in the image corresponds to one micrometer. This would be needed to compute the same matrix Mij that the authors used in their calculations.

    Furthermore, the value〈k〉is not the degree centrality or the clustering coefficient (that depend on a node), but rather the average number of connections per node in the network, computed using the matrix Mij. The paper then compares the observed distributions of degree centralities and clustering coefficients in the brain and cosmic networks to the distribution one would see in a random network with the same number of nodes and the same value of〈k〉. The conclusion is that brain and cosmic networks are highly non-random.

    Edits:

    1. The conversion of 0.32 micrometers per pixel seems to be right. In the files with data on brain samples (both for cortex and cerebellum) the largest value is 50 pixels, which with this conversion corresponds to 16 micrometers. This suggests that the authors of the paper already thresholded the matrices, listing in them only distances not exceeding 16 micrometers. In view of this, to obtain the matrix Mij with 0 and 1 values only, one simply needs to replace all non-zero values with 1. An issue is that using the matrices obtained in this way one gets 〈k〉 = 9.22 for cerebellum and 〈k〉 = 7.13 for cortex, which is somewhat outside the ranges given in the paper. I don't know how to account for this discrepancy.

    2. Negative centrality values are due to a mistake (missing parentheses) in the code. It should be:

    matrix_cerebellum['centrality'] = matrix_cerebellum['K']/(matrix_cerebellum.shape[0] - 1)
    

    3. Clustering coefficient and degree centrality of each node can be computed using tools provided by the networkx library:

    from astropy.io import fits
    import networkx as nx
    
    # get the adjacency matrix for cortex
    with fits.open('matrix_CORTEX_large.fits') as data:
        M = data[0].data
    M[M > 0] = 1
    
    # create a graph object
    G_cortex = nx.from_numpy_matrix(M)
    
    # compute degree centrality of all nodes
    centrality = nx.degree_centrality(G_cortex)
    # compute clustering coefficient of all nodes
    clustering = nx.clustering(G_cortex)