pythonencodingbase64processing-efficiencybase32

What makes repeated base64 and base36 encoding so slow? and how to make it faster?


I was trying out some different encodings and as I tried to encode some text repeatedly in base64/base32 (which one is used depends on a semi-random boolean list) I noticed that it was ridiculously slow which I didn't understand because I thought that they were particularly fast. I can't really figure out why it's so slow, it'd be cool if you could help me.

This is the part of the code concerned :

from base64 import b64encode, b32encode
from random import random as rn

big_number = int(input("The number of encoding layers : "))
bool_list = [True if rn() < 0.5 else False for _ in range(big_number)]
sample_text = bytes("lorem ipsum", "utf8")
for curr_bool in bool_list:
    temp = b64encode(sample_text) if curr_bool else b32encode(sample_text)
    sample_text = temp

Solution

  • Memory and time expensive operations. The answer is based on pivotal comment (by Wups):

    If you encode bytes with base64, the result is longer than the input. If you take this result and encode it again in a loop, you have exponential growth.

    The following modified script shows growing ratio for base64 and base32 encodings:

    from base64 import b64encode, b32encode
    from random import random as rn
    ii = 0
    big_number = int(input("The number of encoding layers : "))
    30
    bool_list = [True if rn() < 0.5 else False for _ in range(big_number)]
    sample_text = bytes("lorem ipsum", "utf8")
    sample_len = len( sample_text )
    current_len = sample_len
    for curr_bool in bool_list:
        sample_text = b64encode(sample_text) if curr_bool else b32encode(sample_text)
        print( ii, curr_bool, len(sample_text), (len( sample_text )/current_len))
        current_len = len( sample_text )
        ii += 1
    

    ** Sample output** (truncated): python .\SO\71009943.py

    The number of encoding layers : 30
    …
    24 True 172320 1.3333333333333333
    25 False 275712 1.6
    26 True 367616 1.3333333333333333
    27 False 588192 1.600017409470752
    28 True 784256 1.3333333333333333
    29 False 1254816 1.6000081606006202
    

    So resultant length is roughly somewhere between Compound interest for base64's 4/3 and base32's 1.6:

    big_number = 30
    round( sample_len * (4/3) ** big_number)
    # 61596
    round( sample_len * 1.6 ** big_number)
    # 14621508
    

    and for a bigger numbers:

    big_number = 50
    round( sample_len * (4/3) ** big_number)
    # 19423591
    round( sample_len * 1.6 ** big_number)
    # 176763184868
    

    and

    big_number = 99
    round( sample_len * (4/3) ** big_number)
    # 25723354884215
    round( sample_len * 1.6 ** big_number)
    # 1775296791184759324672