pythonpython-3.xencoding

How to decode a text in python3?


I have a text Aur\xc3\xa9lien and want to decode it with python 3.8.

I tried the following

import codecs
s = "Aur\xc3\xa9lien"
codecs.decode(s, "urf-8")
codecs.decode(bytes(s), "urf-8")
codecs.decode(bytes(s, "utf-8"), "utf-8")

but none of them gives the correct result Aurélien.

How to do it correctly?

And is there no basic, general authoritative simple page that describes all these encodings for python?


Solution

  • First find the encoding of the string and then decode it... to do this you will need to make a byte string by adding the letter 'b' to the front of the original string.

    Try this:

    import chardet
    
    s = "Aur\xc3\xa9lien"
    bs = b"Aur\xc3\xa9lien"
    
    encoding = chardet.detect(bs)["encoding"]
    
    str = s.encode(encoding).decode("utf-8")
    
    print(str)
    

    If you are reading the text from a file you can detect the encoding using the magic lib, see here: https://stackoverflow.com/a/16203777/1544937