I need to convert an array of floats, 500e6 long, to an ascii file. Doing this in a loop takes long time. I wander if there is a fast way for doing this.
I used a standard loop: NumSamples is 200e6; dataword is a vector of 200e6 values.
NumSamples=len(dataword)
for i in range(1,NumSamples):
MyAscii=str(dataword[i])+"\n"
fout.write(MyAscii)
python as language
Try batching writes, so that you don't call fout.write
every 4 bytes:
from itertools import batched
BATCH_SIZE = 1000
samples = iter(dataword)
# Skip first entry, like in the original code.
next(samples)
for batch in batched(samples, BATCH_SIZE):
# Join 'BATCH_SIZE' floats in one string before writing it to file.
MyAscii='\n'.join(map(str, batch))
fout.write(MyAscii)
This will only call fout.write
once every 1000 floats, at the cost of storing the string for 1000 floats in memory (negligible). As a rule of thumb, use the biggest BATCH_SIZE
you can get without running out of memory.
Or, if you're using a Python version older than 3.12 that doesn't have itertools.batched
, here's a hand-rolled version:
BATCH_SIZE = 1000
NumSamples=len(dataword)
for batch_i in range(1,NumSamples,BATCH_SIZE):
# Account for sample number not being divisible by batch size.
limit = min(NumSamples, batch_i+BATCH_SIZE)
# Join 'BATCH_SIZE' floats in one string before writing it to file.
MyAscii='\n'.join(str(dataword[i]) for i in range(batch_i, limit))
fout.write(MyAscii)