I have a text Aur\xc3\xa9lien
and want to decode it with python 3.8.
I tried the following
import codecs
s = "Aur\xc3\xa9lien"
codecs.decode(s, "urf-8")
codecs.decode(bytes(s), "urf-8")
codecs.decode(bytes(s, "utf-8"), "utf-8")
but none of them gives the correct result Aurélien
.
How to do it correctly?
And is there no basic, general authoritative simple page that describes all these encodings for python?
First find the encoding of the string and then decode it... to do this you will need to make a byte string by adding the letter 'b' to the front of the original string.
Try this:
import chardet
s = "Aur\xc3\xa9lien"
bs = b"Aur\xc3\xa9lien"
encoding = chardet.detect(bs)["encoding"]
str = s.encode(encoding).decode("utf-8")
print(str)
If you are reading the text from a file you can detect the encoding using the magic
lib, see here: https://stackoverflow.com/a/16203777/1544937