Some time ago, I wrote a Python program that deals with email messages. One thing that always comes across is to know whether an email is "multipart" or not.
After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.
My usage of it was limited to 2 instances:
1. When I had to save the attachment from the raw email
I just found this on the internet (probably on here - sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code
def downloadAttachments(emailMsg, pathToSaveFile):
"""
Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\\Program Files\\")
"""
att_path_list = []
for part in emailMsg.walk():
# multipart are just containers, so we skip them
if part.get_content_maintype() == 'multipart':
continue
# is this part an attachment ?
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(pathToSaveFile, filename)
#Check if its already there
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
att_path_list.append(att_path)
return att_path_list
2. When I had to get the text from the raw email
Also pasted from someone on the internet without really understanding how it works.
def get_text(emailMsg):
"""
Output: body of the email (text content)
"""
if emailMsg.is_multipart():
return get_text(emailMsg.get_payload(0))
else:
return emailMsg.get_payload(None, True)
... Is that if the email message is multipart, the parts can be iterated over.
What exactly are these parts? How do you know which one is HTML for example? Or which one is an attachment? Or just the body?
An email message consists of a single MIME part, or a multipart
structure with multiple MIME parts.
If there is no multipart
structure, the message is compatible with pre-MIME RFC822 messages, and the Content-type:
etc headers are optional (if you don't spell out a content type and encoding, Content-type: text/plain; charset="us-ascii"
and Content-transfer-encoding: 7bit
are implied, but they are still good to spell out for human readers; before MIME, inferring the type and encoding of content was more of a wild west best-guess situation).
There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.
A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.
Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type text/*
the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.
For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.
Unfortunately, common practice by message producers recently has been to create a multipart/alternative
container with text/plain
and text/html
members, but then provide a completely useless text/plain
part and have all the actual content in a text/html
part. The correct arrangement in this situation would be to simply not supply a text/plain
part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).