pythondata-conversion

saving float array to ascii


I need to convert an array of floats, 500e6 long, to an ascii file. Doing this in a loop takes long time. I wander if there is a fast way for doing this.

I used a standard loop: NumSamples is 200e6; dataword is a vector of 200e6 values.

NumSamples=len(dataword)
for i in range(1,NumSamples):
    MyAscii=str(dataword[i])+"\n"
    fout.write(MyAscii) 

python as language


Solution

  • Try batching writes, so that you don't call fout.write every 4 bytes:

    from itertools import batched
    BATCH_SIZE = 1000
    samples = iter(dataword)
    # Skip first entry, like in the original code.
    next(samples)
    for batch in batched(samples, BATCH_SIZE):
        # Join 'BATCH_SIZE' floats in one string before writing it to file.
        MyAscii='\n'.join(map(str, batch))
        fout.write(MyAscii) 
    

    This will only call fout.write once every 1000 floats, at the cost of storing the string for 1000 floats in memory (negligible). As a rule of thumb, use the biggest BATCH_SIZE you can get without running out of memory.


    Or, if you're using a Python version older than 3.12 that doesn't have itertools.batched, here's a hand-rolled version:

    BATCH_SIZE = 1000
    NumSamples=len(dataword)
    for batch_i in range(1,NumSamples,BATCH_SIZE):
        # Account for sample number not being divisible by batch size.
        limit = min(NumSamples, batch_i+BATCH_SIZE)
        # Join 'BATCH_SIZE' floats in one string before writing it to file.
        MyAscii='\n'.join(str(dataword[i]) for i in range(batch_i, limit))
        fout.write(MyAscii)