I am using Python's imaplib
module, specifically the IMAP4_SSL
class to get emails from a server.
When I download messages, non-unicode characters are usually escaped as quoted printable escape codes for which I use the quopri
module.
When I use the list
method of the IMAP4_SSL
object however, non-ascii characters are escaped as <ampersand> some three letter code <dash>
, which looks like this:
(\HasNoChildren) "/" "Lib&AOk-rations/Lib&AOk-ration Bilan"
(\HasNoChildren) "/" "Poly/Comite&AwE- de discipline e&AwE-tudiante"
I have never seen this way of escaping characters before, and I can't find it anywhere because I don't know what it's called and search engines keep ignoring the "&" in my queries (I've tried quotes and I get the same results).
Looking at the RFC2060 that describes IMAP, section 5.1.3 describes how mailboxes should be named, using &-
to escape &
, and otherwise using those &
and -
as delimiters for base64 encoded values. I have found a gist by Oleg Buevich that claims to correctly encode and decode those UTF-7 with modified base64 strings. It works as far as I can tell. For the mailboxes listed in the question, I get:
(\HasNoChildren) "/" "Libérations/Libération Bilan"
(\HasNoChildren) "/" "Poly/Comité de discipline étudiante"
which are correct.