encodingcharacter-encodingbase64passlib

Is this a bug in the passlib base64 encoding?


I am trying to decode an re-encode a bytesytring using passlibs base64 encoding:

from passlib.utils import binary

engine = binary.Base64Engine(binary.HASH64_CHARS)
s2 = engine.encode_bytes(engine.decode_bytes(b"1111111111111111111111w"))

print(s2)

This prints b'1111111111111111111111A' which is of course not what I expected. The last character is different.

Where is my mistake? Is this a bug?


Solution

  • No, it's not a bug.

    In all variants of Base64, every encoded character represents just 6 bits and depending on the number of bytes encoded you can end up with 0, 2 or 4 insignificant bits on the end.
    In this case the encoded string 1111111111111111111111w is 23 characters long, that means 23*6 = 138 bits which can be decoded to 17 bytes (136 bits) + 2 insignifcant bits.

    The encoding you use here is not Base64 but Hash64

    Base64 character map used by a number of hash formats; the ordering is wildly different from the standard base64 character map.

    In the character map
    HASH64_CHARS = u("./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
    we find A on index 12 (001100) and w on index 60 (111100)

    Now the 'trick' here is, that
    binary.Base64Engine(binary.HASH64_CHARS) has a default parameter big=False, which means that the encoding is done in little endian format by default.

    In your example it means that w is 001111 and A is 001100. During decoding the last two bits are cut off as they are not needed as explained above. When you encode it again, A is taken as the first character in the character map that can be used two encode 0011 plus two insignifiant bits.