pythonasciiimapimap-tools

Error using imap-tools with an email address containing non-ascii characters in domain name


I use imap-tools to access my emails.

My problem is that I'm trying to access emails sent from someone whose email contains special characters like ø which I can't encode correctly because from_ accepts a string as input, so I am not getting anywhere.

import imap_tools

with imap_tools.MailBox('imap.gmx.net').login(email, password, 'INBOX') as mailbox:
    for msg in mailbox.fetch(imap_tools.AND(from_ = 'beskeder@mød.dk')):
        print('Found')

I shortend my code. I am expecting my program to print Found when the email sent by beskeder@mød.dk is found in my mailbox. Other emails with no special characters are found.

The error message:

Traceback (most recent call last):
  File "/Users/user/Desktop/test.py", line 4, in <module>
    for msg in mailbox.fetch(imap_tools.AND(from_ = 'beskeder@mød.dk')):
  File "/Users/user/Library/Python/3.9/lib/python/site-packages/imap_tools/mailbox.py", line 130, in fetch
    nums = tuple((reversed if reverse else iter)(self.numbers(criteria, charset)))[limit_range]
  File "/Users/user/Library/Python/3.9/lib/python/site-packages/imap_tools/mailbox.py", line 67, in numbers
    encoded_criteria = criteria if type(criteria) is bytes else str(criteria).encode(charset)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 17: ordinal not in range(128)

I tried to add 'beskeder@hottemøder.dk'.encode('ascii', 'ignore') but it's not working either.

Error:

TypeError: "from_" expected str value, "<class 'int'>" received and when I convert it to str() nothing happens.


Solution

  • Looking at the relevant source code for the current version of the fetch method of the library, the corresponding failure in the internal method numbers may be avoided if the correct encoding is provided, e.g. UTF-8. So perhaps doing something like the following may be able to work around the problem:

    mailbox.fetch(imap_tools.AND(from_='beskeder@mød.dk', charset='UTF-8')
    

    This section was written under the assumption that the library didn't properly support the correct RFCs (specifically, RFC 6855 - IMAP Support for UTF-8). This section was also modified for additional relevancy and is kept for reference as it may be useful as background information for the related topics at hand.

    A email address at the protocol level by definition only comprise of characters that are part of ASCII character set (even when internationalized email addresses is now a standard), so the library is correct in that encoding the provided string into the underlying bytes using the ascii codec. Given that ø does not map to one of the valid ASCII character, the resulting error message corresponds to the fact that the support internationalized email addresses might not be enabled (specifically RFC 6855, but as noted it may just be a configuration setting and/or older version of the library).

    Now, the problematic character in that email address appeared in the domain part, which indicates that the domain is in fact an IDN, and encoding scheme for IDNs into bytes are in fact is not through any of the unicode encodings but rather using Punycode representation (related SO thread). As that library does not support IDN given the apparent lack of RFC 6855 support, manual encoding of the domain portion (mød.dk) into Punycode will be needed, which would be xn--md-lka.dk, and thus the email address that would be understood by that library becomes something like example@xn--md-lka.dk. However, given that this is an IMAP library, only the RFC 6855 may actually apply (i.e. the whole email address is in fact encoded as UTF-8), and the issue with IDN/Punycode may not be absolutely relevant for IMAP, but is good to keep in mind that this particular detail with IDNs may also be at play.

    Now this only covers the domain part of the email, but not the local part. If the local part also contains characters with code points outside of the ascii character set, they will need to be encoded into bytes using UTF-8 as per RFC 6530.

    Modern email related libraries should be able to address modern requirements, but sometimes they may be slow to uptake new standards so workarounds like manually encoding parts of email address into the underlying encoding(s) may be required.