pythonregexroundup

What would be a good regexp for identifying the "original message" prefix in gmail?


An example signature may be:

On Tue, Mar 20, 2012 at 2:38 PM, Johnny Walker <johnny.talker@gmail.com> wrote:

And then follows the quoted reply. I do have a discrete sensation this is locale specific though which makes me a sad programmer.

The reason I ask for this is because roundup doesn't strip these correctly when replying through gmail to an issue. And I think origmsg_re is the config.ini variable I need to set alongside keep_quoted_text = no to fix this.

Right now it's the default origmsg_re = ^[>|\s]*-----\s?Original Message\s?-----$

Edit: Now I'm using origmsg_re = ^On[^<]+<.+@.+>[ \n]wrote:[\n] which works with some gmail clients that break lines that are too long.


Solution

  • The following regex will match gmails prefix in a pretty safe manner. It ensures that there are 3 commas and the liter text On ... wrote

    On([^,]+,){3}.*?wrote:
    

    If the regex should match in a case insensitve way then don't forget to add the modifier.

    if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):
        # Successful match
    else:
        # Match attempt failed
    

    Kind Regards, Buckley

    Match the characters “On” literally «On»
    Match the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»
       Exactly 3 times «{3}»
       Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
       Match any character that is NOT a “,” «[^,]+»
          Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
       Match the character “,” literally «,»
    Match any single character that is not a line break character «.*?»
       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
    Match the characters “wrote:” literally «wrote:»
    
    Created with RegexBuddy