pythonemailmaildir

Determine unique "from" email addresses in Maildir folder


I want to find out a list of "From" addresses in a Maildir folder. Using the following script, it illustrates the varying formats that are valid in From:

import mailbox

mbox = mailbox.Maildir("/home/paul/Maildir/.folder") 
for message in mbox:
    print message["from"]

"John Smith" <jsmith@domain.com>
Tony <tony@domain2.com>
brendang@domain.net

All I need is the email address, for any valid (or common) "From:" field format. This must have been solved a crazillion times before, so I was expecting a library. All I can find is various regexes.

Is there a standard approach?


Solution

  • email.utils.parseaddr is your friend:

    >>> emails = """"John Smith" <jsmith@domain.com>
    Tony <tony@domain2.com>
    brendang@domain.net"""
    >>> lines = emails.splitlines()
    >>> from email.utils import parseaddr
    >>> [parseaddr(email)[1] for email in lines]
    ['jsmith@domain.com', 'tony@domain2.com', 'brendang@domain.net']
    

    So you should just be able to work with:

    for message in mbox:
        print parseaddr(message['from'])
    

    Then, I guess if you just want unique email addresses, then you can just use a set directly over mbox, eg:

    mbox = mailbox.MailDir('/some/path')
    uniq_emails = set(parseaddr(email['from'])[1] for email in mbox)