
Python algorithm speed up / Perfomance tips

I'm working with big file manipulation (over 2Gb) and I have a lot of processing functions to deal with the data. My problem is that it is taking a lot (A LOT) of time to finish the processing. From all function the one that seems to take longer is this one:

 def BinLsb(data):
        Len = len(data)
        databin = [0] * (int(Len))
        num_of_bits = 8
        ###convert to bin the octets and LSB first
        for i in range(Len):
            newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
            databin[i] = newdatabin
        ###group the 14bit and LSB again
        databin = ''.join(databin)
        composite_list = [databin[x:x + 14] for x in range(0, len(databin), 14)]
        LenComp = len(composite_list)
        for i in range(LenComp):
            composite_list[i] = (int(str(composite_list[i])[::-1], 2))
        return composite_list

I'd really appreciate some performance tips / another approach to this algorithm in order to save me some time. Thanks in advance!


  • basic analysis of your function: time complexity: 3O(n) space complexity: 3O(n). because your loop 3 times; my suggestion is loop once, use generator, which will cost 1/3 of time and space.

    I upgraded your code and remove some useless variable using a generator:

    def binLsb(data):
        databin = ""
        num_of_bits = 8
        for i in range(len(data)):
            newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
            while len(str(databin)) > 14:
                yield (int(str(databin[:14])[::-1], 2))
                databin = databin[14:]
            databin += str(newdatabin)

