pythonrfc2822

How to get message body (or bodies) from Message object returned by email.parser.Parser?


I'm reading the Python 3 docs here and I must be blind or something... Where does it say how to get the body of a message?

What I want to do is to open a message and perform some loop in text-based bodies of the message, skipping binary attachments. Pseudocode:

def read_all_bodies(local_email_file):
    email = Parser().parse(open(local_email_file, 'r'))
    for pseudo_body in email.pseudo_bodies:
        if pseudo_body.pseudo_is_binary():
            continue
        # Pseudo-parse the body here

How do I do that? Is even Message class correct class for this? Isn't it only for headers?


Solution

  • This is best done using two functions:

    1. One to open the file. If the message is single-part, get_payload returns string in the message. If message is multipart, it returns list of sub-messages
    2. Second to handle the text/payload

    This is how it can be done:

    def parse_file_bodies(filename):
        # Opens file and parses email
        email = Parser().parse(open(filename, 'r'))
        # For multipart emails, all bodies will be handled in a loop
        if email.is_multipart():
            for msg in email.get_payload():
                parse_single_body(msg)
        else:
            # Single part message is passed diractly
            parse_single_body(email)
    
    def parse_single_body(email):
        payload = email.get_payload(decode=True)
        # The payload is binary. It must be converted to
        # python string depending in input charset
        # Input charset may vary, based on message
        try:
            text = payload.decode("utf-8")
            # Now you can work with text as with any other string:
            ...
        except UnicodeDecodeError:
            print("Error: cannot parse message as UTF-8")
            return