pythonunicode-string

Convert raw string (having escape characters) to unicode/utf8 string


In Python 3, how to convert an ASCII raw-string (that includes escape characters) into a proper unicode string?

As an example:

a = "ä"                         # note the umlaut
b = bytearray( a, "utf8" )      # yields: bytearray(b'\xc3\xa4')
s = r'\xc3\xa4'                 # note it's a raw string

In the example you can see how my source string s derives from the unicode string a, informed by b. The goal is to find a function, F, such that a == F(s). Thanks for your help!

I tried every combination of encode and decode and codecs that I could think of. Note, in particular, that the following yields False:

a == s.encode('latin-1').decode('unicode-escape')

Solution

  • You were so close!

    s.encode('latin-1').decode('unicode-escape').encode('latin-1').decode('utf-8')