emailemail-parsing

Existing tool or code to identify quoted text in emails


I am looking for a way to identify quoted text in emails. The goal is to add something along the lines of Gmails "show quoted text" feature to my web app which involves a mail handler bot.

There are similar questions on stackoverflow, but they are asking for an algorithm. I could implement this if I have to, but I would greatly prefer a tried and true solution.

Requirements:

1) Support both HTML and plain text emails

2) Operates on the full thread (that is, it has the original text to compare the quoted text against; no need to guess)

3) Handles common quote-related additions such as "On May 10th, 2008 at 6:35 PM Brandon wrote:"

A python library would be super magically awesome ideal, but I don't expect to get that lucky. A simple command line tool which can do this would pretty close to ideal, but I don't expect to that that lucky either. I'd gladly settle on a well known good implementation from an open source mail client which would be reasonably possible to extract into a tool.

Does anyone have a suggestion what my best bet would be?

I'm kind of surprised that there is no such thing as an "email handler bot construction kit".


Solution

  • Just following up on an email I received regarding this question.

    Sup has a pretty easy to understand/extract/translate bit of logic for accomplishing this. I ported the relevant functions to Python and tweaked it for my purposes.

    Sup is terminal-based mail client written in Ruby: http://sup.rubyforge.org/