pythonpython-3.xbase64

Why do I need 'b' to encode a string with Base64?


I followed an example from the documentation of how to use Base64 encoding in Python:

>>> import base64
>>> encoded = base64.b64encode(b'data to be encoded')
>>> encoded
b'ZGF0YSB0byBiZSBlbmNvZGVk'

But, if I try to encode a normal string - leaving out the leading b:

>>> encoded = base64.b64encode('data to be encoded')

I get a TypeError. In older versions of Python it looked like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python32\lib\base64.py", line 56, in b64encode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str

In more recent versions it might look like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'

Why does this happen?


Solution

  • base64 encoding takes 8-bit binary byte data and encodes it uses only the characters A-Z, a-z, 0-9, +, /* so it can be transmitted over channels that do not preserve all 8-bits of data, such as email.

    Hence, it wants a string of 8-bit bytes. You create those in Python 3 with the b'' syntax.

    If you remove the b, it becomes a string. A string is a sequence of Unicode characters. base64 has no idea what to do with Unicode data, it's not 8-bit. It's not really any bits, in fact. :-)

    In your second example:

    >>> encoded = base64.b64encode('data to be encoded')
    

    All the characters fit neatly into the ASCII character set, and base64 encoding is therefore actually a bit pointless. You can convert it to ascii instead, with

    >>> encoded = 'data to be encoded'.encode('ascii')
    

    Or simpler:

    >>> encoded = b'data to be encoded'
    

    Which would be the same thing in this case.


    * Most base64 flavours may also include a = at the end as padding. In addition, some base64 variants may use characters other than + and /. See the Variants summary table at Wikipedia for an overview.