I have a very large csv file, with a matrix like this:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
It is always a n*n matrix. The first column and the first row are the names. I want to convert it to a 3 column format (also could be called edge list, long form, etc) like this:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
I have used:
row = 0
for line in fin:
line = line.strip("\n")
col = 0
tokens = line.split(",")
for t in tokens:
fout.write("\n%s,%s,%s"%(row,col,t))
col += 1
row += 1
doesn't work...
Could you please help? Thank you..
You also need to enumerate the column titles as your print out the individual cells.
For a matrix file mat.csv:
null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4
The following program:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for line in csv:
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns,tokens[1:]):
print '{},{},{}'.format(row,column,cell)
prints out:
A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4
For generating the upper diagonal, you can use the following script:
csv = open("mat.csv")
columns = csv.readline().strip().split(',')[1:]
for i, line in enumerate(csv):
tokens = line.strip().split(',')
row = tokens[0]
for column, cell in zip(columns[i:],tokens[i+1:]):
print '{},{},{}'.format(row,column,cell)
which results in the output:
A,A,0
A,B,2
A,C,3
B,B,4
B,C,2
C,C,4