I'm using Python 2.7, and I am trying to properly decode the subject header line of an email. The source of the email is:
Subject: =?UTF-8?B?VGkgw6ggcGlhY2l1dGEgbGEgZGVtbz8gU2NvcHJpIGFsdHJlIG4=?=
I use the function decode_header(header) from the email.header library, and the result is:
[('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')]
The 'xc3\xa8' part should match the 'è' character, but it is not correctly decoded/showed. Another example:
Subject: =?iso-8859-1?Q?niccol=F2_cop?= =?iso-8859-1?Q?ernico?=
Result:
[('niccol\xf2 copernico', 'iso-8859-1')]
How can I obtain the correct string?
You are getting the correct string. It's just encoded (using UTF-8 in the first case, and iso-8895-1 in the second); you need to decode it to get the actual unicode string.
For example:
>>> print unicode('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')
Ti è piaciuta la demo? Scopri altre n
Or:
>>> print unicode('niccol\xf2 copernico', 'iso-8859-1')
niccolò copernico
That's why you get back both the header data and the encoding.