pythonmatrixoptimizationphysics

Optimizing Python code to read multiple big matrices from a text file


I have a numerical simulation that spits out a text file with a sequence of square matrices with complex entries. They look like this:

 (#,#)   (#,#)   (#,#)
 (#,#)   (#,#)   (#,#)
 (#,#)   (#,#)   (#,#)
-----

 (#,#)   (#,#)   (#,#)
 (#,#)   (#,#)   (#,#)
 (#,#)   (#,#)   (#,#)
-----

...

I want a fast way of reading this files and store the matrices as numpy arrays in a list. I wrote the following python code to try and solve this problem

with open(f'fort.{n}', 'r') as f:
    l = [[num for num in line.split(' ') if num != ''] for line in f]

l = list(filter((l[-2]).__ne__, l))

l = [[num for num in line if num != '\n'] for line in l]
l = [[split(num) for num in line] for line in l]

l = list(filter(([['-----\n']]).__ne__, l))

l = [[float(num[0])+1j*float(num[1]) for num in line] for line in l]

Solutions = []
for i in range(len(l)):
    if (i+1)%160 == 0:
        Solutions.append(l[i-159:i+1])

but the output files contain hundreds of 160x160 matrices and my program is taking most of its running time to read them. I want to optimize this process but don't know how.


Solution

  • As @dankal444 pointed out in his comment, the .npy format is much better to do what I want, which is transfering data from a Fortran program to a Python script that analyses it. With that in mind, I found a module that lets you write Fortran arrays as .npy files. It looks to be as fast as writing the arrays in unformatted binary files.