I want to find out a list of "From" addresses in a Maildir folder. Using the following script, it illustrates the varying formats that are valid in From:
import mailbox
mbox = mailbox.Maildir("/home/paul/Maildir/.folder")
for message in mbox:
print message["from"]
"John Smith" <jsmith@domain.com>
Tony <tony@domain2.com>
brendang@domain.net
All I need is the email address, for any valid (or common) "From:" field format. This must have been solved a crazillion times before, so I was expecting a library. All I can find is various regexes.
Is there a standard approach?
email.utils.parseaddr is your friend:
>>> emails = """"John Smith" <jsmith@domain.com>
Tony <tony@domain2.com>
brendang@domain.net"""
>>> lines = emails.splitlines()
>>> from email.utils import parseaddr
>>> [parseaddr(email)[1] for email in lines]
['jsmith@domain.com', 'tony@domain2.com', 'brendang@domain.net']
So you should just be able to work with:
for message in mbox:
print parseaddr(message['from'])
Then, I guess if you just want unique email addresses, then you can just use a set
directly over mbox
, eg:
mbox = mailbox.MailDir('/some/path')
uniq_emails = set(parseaddr(email['from'])[1] for email in mbox)