pythonstringencodingquoted-printable

Python fix french accents parsed as =C3=A9


In python i'm stuck with a couple of strings from french language with accents that I can't convert back to normal, e.g.:

word1 = 'install=C3=A9' # should be installé
word2 = 'transf=E9r=E9' # should be transféré
word3 = 'bient=C3=B4t'  # should be bientôt

Most documentation I read specify to read the files with some encodings='utf-8' or so, but here I'm stuck with actual strings. Is there a way to decode the strings or should I build a maximega .replace() function ?


Solution

  • The encoding seems to be Quoted Printable.

    import quopri
    word1 = 'install=C3=A9'
    byteString = quopri.decodestring(word1)
    string = byteString.decode('utf-8')
    print(string)
    

    Actually the function expects bytes as input, so it would be even better to have the words declared as bytes:

    word1 = b'install=C3=A9'