I am using a PHP script to automatically download emails from an IMAP server. The HTML of the email body looks something like this:
<!DOCTYPE html><html lang=3D"en" xmlns=3D"http://www.w3.org/1999/xhtml" xml=
ns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com=
:office:office"><head><meta http-equiv=3D"Content-Type" content=3D"text/htm=
l; charset=3Dutf-8"><meta name=3D"viewport" content=3D"width=3Ddevice-width=
,initial-scale=3D1"><meta http-equiv=3D"X-UA-Compatible" content=3D"IE=3Ded=
ge"><meta name=3D"x-apple-disable-message-reformatting"><meta name=3D"forma=
This looks all like proper HTML to me except the line break with the '=' at the end but I am not 100% sure if there are no other contents changed in the way html is stored in email.
What would be the best method to remove the '=' at the end of line where required and other potential changes that happen when storing html in an email body, ideally in PHP or Linux command line? I want to eventually save this as an HTML file so that it can be looked at in a browser or converted from that format into others.
The format used in emails is called "quoted printable". The php function to convert it is quoted_printable_decode()
A sample function to solve this issue would be:
$decoded_data = quoted_printable_decode($text);