2x2 contingency matrix:
Cj
2 1
Ci
1 0
Translates to:
[[ 0 0 0 1 ]
[ 0 0 1 0 ]]
The contingency matrix represents the outcome of two clustering algorithms, each with two clusters. The first row indicates that Ci
has three data points in, say, cluster 1 and one data point in, say, cluster 2. Cj
has three data points in, say, cluster A and 1 data point in, say, cluster B. Therefore, both algorithms "agree" on two out of N = 4 data points.
Since there does not exist an adjusted mutual information function that takes in the contingency matrix as input, I would like to transform the contingency matrix to 1d inputs for the sklearn implementation of AMI.
Is there an efficient way to re-write a NxN contingency matrix in 1D vector form in Python code?
It would look something like:
V1
V2
For i row index
For j column index
Append as many as contingency_ij elements with value i to V1 and with value j to V2
The output should always be two vectors. Another example:
2 0 0
0 1 0
0 0 1
Would lead to two 1D vectors:
0 0 1 2
0 0 1 2
Well, this solves the problem as you have stated it. The final matrix v
can be converted to numpy. v
would need as many empty elements as there are dimensions in c
.
def produce_vectors( c ):
v = [[],[]]
for i,row in enumerate(c):
for j,val in enumerate(row):
v[0].extend( [i]*val )
v[1].extend( [j]*val )
return v
c = [[2,1],[1,0]]
print(produce_vectors(c))
c = [[2,0,0],[0,1,0],[0,0,1]]
print(produce_vectors(c))
Output:
[[0, 0, 0, 1], [0, 0, 1, 0]]
[[0, 0, 1, 2], [0, 0, 1, 2]]