pythongmail

How to use python gmail API to (effectively) insert a hyperlink into an existing email (via insert/delete API functions) without it getting mangled?


I'm attempting to write a script capable of effectively adding a hyperlink to an existing email. I've already got a similar script working which effectively resizes down large images in existing emails in order to save space.

I'm achieving the editing of an email by retrieving the email, editing the retrieved copy, writing it back with the same threadId (although gmail assigns a new id), and deleting the email with the original id, so all's that's left is the altered copy, in the same place as the original was.

But when I try adding a hyperlink and viewing the end result in the gmail web interface, it's non-clickable, and what I see in gmail's show original function is this:

<a href"http://testurl.com">testurl text</a>

i.e. the equals sign after href is missing, everything else is fine.

After having set up the service as per the GMail python quickstart docs, and the query parameters in the variable named query, the code being executed is as per the below:

# get content of first email matching query criteria
results = service.users().messages().list(userId='me', q=query, maxResults = 500).execute()
fullmsg = service.users().messages().get(userId='me', id=results['messages'][0]['id'], format='raw').execute()
unencoded = base64.urlsafe_b64decode(fullmsg['raw']).decode('utf-8')
mmsg = email.message_from_string(unencoded)

# find text/html part, and add link to start of html body using BeautifulSoup
for part in mmsg.walk():
    if part.get_content_type() == 'text/html':
        soup = BeautifulSoup(part.get_payload(), features='lxml')
        linkTag = soup.new_tag('a', href='http://testurl.com')
        linkTag.append("testurl text")
        soup.body.insert(0,linkTag)
        part.set_payload(str(soup))
encmsg = base64.urlsafe_b64encode(mmsg.as_string().encode('utf-8'))

# Add a label to indicate this email has been processed already (so query will exclude it from future searches) and write altered copy back to the server, deleting the original email
newLabels = fullmsg['labelIds'] + [doneLabelID]
fixedMsg = { 'raw' : encmsg.decode(), 'labelIds' : newLabels, 'threadId' : fullmsg['threadId'] }
response = service.users().messages().insert(userId='me', body=fixedMsg, internalDateSource='dateHeader').execute()
service.users().messages().delete(userId='me', id=results['messages'][0]['id']).execute()          

pdb gives encouraging results when inspecting the altered body at various points, as does running the below after the above has been executed:

fromg = service.users().messages().get(userId='me', id=response['id'], format='raw').execute()
unenc = base64.urlsafe_b64decode(fromg['raw']).decode('utf-8')
gmmsg = email.message_from_string(unenc)
for part in gmmsg.walk(): 
    if part.get_content_type() == 'text/html':
        print(part)

At this point, after writing the copy back to the server and retrieving that email via the API, the anchor tag in the printed text includes contents href='http://testurl.com'. But when in the gmail web interface looking at the email the link text ("testurl text"), as mentioned above, just appears as normal text. Looking at the source via show original href is lacking the equals sign, and forwarding the email, looking at the message source in thunderbird and b64decoding the encoded text/html segment shows just the text that was enclosed by anchor tags ("testurl text") - the anchor tags themselves are completely missing (presumably stripped by gmail due to the href attribute not being well-formed?).

I've also tried without involving BeautifulSoup (inserting the anchor tags manually) in case it was introducing something weird, but that made no difference.

If anyone has any idea what is causing gmail to mangle these messages when viewed in or forwarded from the gmail web interface (but not to do so when the message is accessed via the API) all help would be much appreciated.


Edit: Issue is not apparent with a simple html email composed in the gmail web interface - that email can have a link added to it by the above code without the mangling. More complex html emails received from our partner business (which I was originally testing on) do exhibit this issue though.

The simple email that works has Content-Type: text/html; charset="UTF-8", the one from the partner business that doesn't work has Content-Type: text/html; charset=UTF-8\nContent-Transfer-Encoding: quoted-printable, as well as looking odd where you would expect to see equals signs in the html content i.e. <meta charset"utf-8" content'3D"text/html;' http-equiv'3D"Content-Type"= '/>. Not sure if this is part of the issue. Tried writing my link as <a href'3D"http://testurl.com"'>Test text</a> to mimic that, but while it showed up as written in show original it still did not result in a clickable link.


Solution

  • Turns out the Content-Transfer-Encoding was key, I needed to write the anchor tags as <a href=3D"http://testurl.com">test text</a>. Presumably the emails being sent by our partner businesses aren't displaying as intended due to their attempt at correctly encoding equals signs ending up as '3D instead of =3D.