pythonbase64imapimaplibutf-7

Escaped characters (of form &AOk- for «é») in IMAP4 list response, using IMAP4_SSL in Python


I am using Python's imaplib module, specifically the IMAP4_SSL class to get emails from a server.

When I download messages, non-unicode characters are usually escaped as quoted printable escape codes for which I use the quopri module.

When I use the list method of the IMAP4_SSL object however, non-ascii characters are escaped as <ampersand> some three letter code <dash>, which looks like this:

(\HasNoChildren) "/" "Lib&AOk-rations/Lib&AOk-ration Bilan"
(\HasNoChildren) "/" "Poly/Comite&AwE- de discipline e&AwE-tudiante"

I have never seen this way of escaping characters before, and I can't find it anywhere because I don't know what it's called and search engines keep ignoring the "&" in my queries (I've tried quotes and I get the same results).


Solution

  • Looking at the RFC2060 that describes IMAP, section 5.1.3 describes how mailboxes should be named, using &- to escape &, and otherwise using those & and - as delimiters for base64 encoded values. I have found a gist by Oleg Buevich that claims to correctly encode and decode those UTF-7 with modified base64 strings. It works as far as I can tell. For the mailboxes listed in the question, I get:

    (\HasNoChildren) "/" "Libérations/Libération Bilan"
    (\HasNoChildren) "/" "Poly/Comité de discipline étudiante"
    

    which are correct.