pythonmatrixadjacency-matrixmatrix-transform

Python - Convert a matrix to edge list/long form


I have a very large csv file, with a matrix like this:

null,A,B,C

A,0,2,3

B,3,4,2

C,1,2,4

It is always a n*n matrix. The first column and the first row are the names. I want to convert it to a 3 column format (also could be called edge list, long form, etc) like this:

A,A,0

A,B,2

A,C,3

B,A,3

B,B,4

B,C,2

C,A,1

C,B,2

C,C,4

I have used:

row = 0
for line in fin:
    line = line.strip("\n")
    col = 0
    tokens = line.split(",")
    for t in tokens:
        fout.write("\n%s,%s,%s"%(row,col,t))
        col += 1
    row += 1

doesn't work...

Could you please help? Thank you..


Solution

  • You also need to enumerate the column titles as your print out the individual cells.

    For a matrix file mat.csv:

    null,A,B,C
    A,0,2,3
    B,3,4,2
    C,1,2,4
    

    The following program:

    csv = open("mat.csv")
    
    columns = csv.readline().strip().split(',')[1:]
    for line in csv:
        tokens = line.strip().split(',')
        row = tokens[0]
        for column, cell in zip(columns,tokens[1:]):
            print '{},{},{}'.format(row,column,cell)
    

    prints out:

    A,A,0
    A,B,2
    A,C,3
    B,A,3
    B,B,4
    B,C,2
    C,A,1
    C,B,2
    C,C,4
    

    For generating the upper diagonal, you can use the following script:

    csv = open("mat.csv")
    
    columns = csv.readline().strip().split(',')[1:]
    for i, line in enumerate(csv):
        tokens = line.strip().split(',')
        row = tokens[0]
        for column, cell in zip(columns[i:],tokens[i+1:]):
            print '{},{},{}'.format(row,column,cell)
    

    which results in the output:

    A,A,0
    A,B,2
    A,C,3
    B,B,4
    B,C,2
    C,C,4