pythonfor-loopentropycontingency

Contingency matrix to 1D format in Python


2x2 contingency matrix:

     Cj
    2  1
Ci
    1  0

Translates to:

[[ 0 0 0 1 ]
 [ 0 0 1 0 ]]

The contingency matrix represents the outcome of two clustering algorithms, each with two clusters. The first row indicates that Ci has three data points in, say, cluster 1 and one data point in, say, cluster 2. Cj has three data points in, say, cluster A and 1 data point in, say, cluster B. Therefore, both algorithms "agree" on two out of N = 4 data points.

Since there does not exist an adjusted mutual information function that takes in the contingency matrix as input, I would like to transform the contingency matrix to 1d inputs for the sklearn implementation of AMI.

Is there an efficient way to re-write a NxN contingency matrix in 1D vector form in Python code?

It would look something like:

V1
V2
For i row index 
  For j column index
     Append as many as contingency_ij elements with value i to V1 and with value j to V2

The output should always be two vectors. Another example:

2 0 0
0 1 0
0 0 1

Would lead to two 1D vectors:

0 0 1 2
0 0 1 2

Solution

  • Well, this solves the problem as you have stated it. The final matrix v can be converted to numpy. v would need as many empty elements as there are dimensions in c.

    
    def produce_vectors( c ):
        v = [[],[]]
    
        for i,row in enumerate(c):
            for j,val in enumerate(row):
                v[0].extend( [i]*val )
                v[1].extend( [j]*val )
        return v
    
    c = [[2,1],[1,0]]
    print(produce_vectors(c))
    c = [[2,0,0],[0,1,0],[0,0,1]]
    print(produce_vectors(c))
    

    Output:

    [[0, 0, 0, 1], [0, 0, 1, 0]]
    [[0, 0, 1, 2], [0, 0, 1, 2]]