pythonpython-3.xpython-unicode

Printing strings with UTF-8 encoded characters, e.g.: "\u00c5\u009b\"


I would like to print strings encoded like this one: "Cze\u00c5\u009b\u00c4\u0087" but I have no idea how. The example string should be printed as: "Cześć".

What I have tried is:

str = "Cze\u00c5\u009b\u00c4\u0087"
print(str) 
#gives: CzeÅÄ

str_bytes = str.encode("unicode_escape")
print(str_bytes) 
#gives: b'Cze\\xc5\\x9b\\xc4\\x87'

str = str_bytes.decode("utf8")
print(str) 
#gives: Cze\xc5\x9b\xc4\x87

Where

print(b"Cze\xc5\x9b\xc4\x87".decode("utf8"))

gives "Cześć", but I don't know how to transform the "Cze\xc5\x9b\xc4\x87" string to the b"Cze\xc5\x9b\xc4\x87" bytes.

I also know that the problem are additional backslashes in the byte representation after encoding the basis string with "unicode_escape" parameter, but I don't know how to get rid of them - str_bytes.replace(b'\\\\', b'\\') doesn't work.


Solution

  • Use raw_unicode_escape:

    text = 'Cze\u00c5\u009b\u00c4\u0087'
    text_bytes = text.encode('raw_unicode_escape')
    print(text_bytes.decode('utf8')) # outputs Cześć