htmlemailpandoctext-processingplaintext

Convert email HTML with nested blockquotes to plain text for e-mail (pandoc)?


I have this problem, where - say, - I write emails in plain-text; somebody else replies in HTML - with their client converting e-mail quote characters to <blockquote> tags; and then I want to reply to that using plain text again. However, my webmail client here would interpret only the first level of <blockquote> nesting to quote characters, thereby losing all of the quote nesting levels.

So, at first I thought I could cheat it with Thunderbird: start a new e-mail ("Write") in HTML format, then you have Insert option in the compose/"Write:" e-mail window, choose Insert/HTML and paste in the raw e-mail HTML (my webmail client has an option that allows for the raw HTML of an HTML e-mail to be copied, which is great), save this as Draft e-mail. Then reply to this draft e-mail - my Thunderbird is setup to always reply in plain-text; however, also here, only the first <blockquote> level is converted back to quote characters, so the quoting/threading nesting levels are gone.

So, I thought - maybe I can use pandoc for this conversion instead? And indeed, it does work - here is an example HTML e-mail, that I saved as text.html:

<p>&nbsp;</p>
<div id="_rc_sig">&nbsp;</div>
<p>&nbsp;</p>
<p id="reply-intro">john@example.com skrev den 17.06.2020 22:30:</p>
<blockquote>
<div id="replybody1">
<div>
<div style="color: #000000; font-family: arial, helvetica, sans-serif; font-size: 12pt;">
<div>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc at blandit velit.</div>
<div>&nbsp;</div>
<hr id="v1zwchr" />
<div><strong>From: </strong>john@example.com<br /><strong>To: </strong>"Jack Jackson" &lt;jack@example.com&gt;<br /><strong>Cc: </strong>"Bob Bobson" &lt;bob@example.com&gt;, "Fred Fredson" &lt;fred@example.com&gt;, "Jim Jimson" &lt;jim@example.com&gt;<br /><strong>Sent: </strong>Wednesday, 17 June, 2020 22:24:47<br /><strong>Subject: </strong>Re: My Email Subject 01</div>
<div>&nbsp;</div>
<div>
<div style="color: #000000; font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;">
<div>Hi there,</div>
<br />
<div>In hac habitasse platea dictumst. Sed nec purus leo. In est metus, tempor quis dapibus id, mollis ut est.</div>
<br />
<div>Morbi eu mauris sodales, sagittis nisl at, euismod quam. Proin mauris tortor, viverra eu erat eget, ultrices ornare justo. Donec lacinia nisi sit amet dolor semper posuere.</div>
<br />
<div>Pellentesque mattis, nisl quis scelerisque blandit, enim ipsum vestibulum diam, eu feugiat metus diam vel nisi. Etiam porttitor nisl ut ultrices feugiat.</div>
<br />
<div>Maecenas et neque at ante bibendum tempus vel a est. Sed vehicula urna augue, quis rutrum sapien congue et. Nullam ac elit quis metus ullamcorper placerat sed quis nunc:<br />1: Aliquam vestibulum lobortis dui, in mattis ipsum euismod sed.<br />2: Phasellus et fringilla tortor.<br />3: Donec diam nunc, aliquet a ultrices nec, interdum at dolor. <br />Fusce euismod finibus mi, sed viverra orci pulvinar non. Suspendisse in magna ut nunc finibus tempor eget et tortor.</div>
<br />
<div>Nullam sollicitudin sem id nibh placerat pellentesque. Ut fermentum pharetra venenatis. Maecenas vehicula, mauris a tincidunt vulputate, ante turpis finibus diam, a interdum ex sapien vel lorem. Nunc faucibus est eu eleifend venenatis. Mauris sed egestas nisi. Fusce suscipit tortor ac ultrices scelerisque. In hac habitasse platea dictumst.</div>
<br />
<div>Mauris tempor egestas nibh, a congue sapien tristique at. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla quam felis, tincidunt eget pretium ut, tempus vel tellus</div>
<br /><br />
<div>Etiam tincidunt risus sapien, eget tristique dolor dignissim a. Maecenas mi quam, auctor sit amet hendrerit id, finibus in ligula.</div>
<br />
<div>Cheers John</div>
<br /><hr id="v1zwchr" />
<div><strong>From: </strong>john@example.com<br /><strong>To: </strong>"Jack Jackson" &lt;jack@example.com&gt;<br /><strong>Cc: </strong>"Bob Bobson" &lt;bob@example.com&gt;, "Fred Fredson" &lt;fred@example.com&gt;, "Jim Jimson" &lt;jim@example.com&gt;<br /><strong>Sent: </strong>Wednesday, 17 June, 2020 21:41:05<br /><strong>Subject: </strong>Re: My Email Subject 01</div>
<br />
<div>
<div style="color: #000000; font-family: 'arial' , 'helvetica' , sans-serif; font-size: 12pt;">
<div>Phasellus non dolor pharetra turpis viverra varius non vitae lectus. Quisque egestas, diam quis viverra fringilla, ex urna consequat tortor, vel aliquet arcu purus sit amet nisi.</div>
<br /><hr id="v1zwchr" />
<div><strong>From: </strong>"Jack Jackson" &lt;jack@example.com&gt;<br /><strong>To: </strong>john@example.com, "Bob Bobson" &lt;bob@example.com&gt;<br /><strong>Cc: </strong>"Fred Fredson" &lt;fred@example.com&gt;, "Jim Jimson" &lt;jim@example.com&gt;<br /><strong>Sent: </strong>Wednesday, 17 June, 2020 21:34:53<br /><strong>Subject: </strong>Re: My Email Subject 01</div>
<br />
<div>
<p><span>Hi there</span></p>
<p><span>Suspendisse non nunc feugiat sapien pellentesque eleifend.<br />Quisque ipsum elit, volutpat eu mollis ac, hendrerit id ante..<br />Proin in nisi mi. Sed in lobortis risus. Donec sit amet ullamcorper mi.<br /></span></p>
<p><span>Jack<br /></span></p>
<p>&nbsp;</p>
<div class="v1moz-cite-prefix">Den 17-06-2020 kl. 21:28 skrev <a href="mailto:john@example.com" rel="noreferrer">john@example.com</a>:</div>
<blockquote>
<pre class="v1moz-quote-pre">Hi there,

Nulla eget diam nunc. Pellentesque in metus ligula.

Donec finibus erat id pharetra faucibus. Maecenas enim dui, semper eleifend nulla molestie, vulputate vulputate est.

Nam sit amet elit non dolor rhoncus pulvinar eget eu metus. Vestibulum hendrerit pretium nunc. Nullam diam massa, dictum a velit non, eleifend maximus nulla.

Vivamus vel congue nunc. In eget justo a lectus pulvinar facilisis.



----- Original Message -----
From: "Bob Bobson" <a href="mailto:bob@example.com" rel="noreferrer">&lt;bob@example.com&gt;</a>
To: <a href="mailto:john@example.com" rel="noreferrer">john@example.com</a>
Cc: "Jack Jackson" <a href="mailto:jack@example.com" rel="noreferrer">&lt;jack@example.com&gt;</a>, "Fred Fredson" <a href="mailto:fred@example.com" rel="noreferrer">&lt;fred@example.com&gt;</a>, "Jim Jimson" <a href="mailto:jim@example.com" rel="noreferrer">&lt;jim@example.com&gt;</a>
Sent: Wednesday, 17 June, 2020 16:40:54
Subject: Re: My Email Subject 01

Hi there,

Duis lacinia arcu sit amet aliquet sagittis. Integer tempor tortor eu ornare mattis. Morbi condimentum
auctor sodales. Maecenas ultrices leo at massa commodo sagittis.

Etiam justo est, mollis sed pellentesque quis, convallis nec ipsum. Nunc eget
nisl lacinia, ultricies magna ac, rutrum dui. Vestibulum vulputate ut lorem eu bibendum.


Vestibulum fermentum turpis est, a pulvinar tortor sollicitudin in. Fusce
tempor felis vel sem posuere, ac sodales justo suscipit. Nullam a orci ut ex
condimentum porta eget et eros. Maecenas congue erat ut nulla tempor pharetra.
Nullam velit quam, venenatis eget neque eget, porttitor consectetur
neque. Sed sed neque magna.

&aelig;nean ornare, diam eget porttitor vestibulum, enim nibh tempus enim,
a tempor nisl neque id velit.

</pre>
<blockquote>
<pre class="v1moz-quote-pre">Praesent eget vehicula urna, at vulputate elit. Aenean a ornare justo.
Vivamus
eu nunc consectetur, mattis ex nec, commodo metus.
denne v&aelig;rdi,
Donec consectetur et lectus vel elementum. Pellentesque eget pretium
enim.
</pre>
</blockquote>
<pre class="v1moz-quote-pre">Ah - Etiam tempor ultrices nisl, quis malesuada sem blandit vel.
Vivamus dignissim felis quis ante volutpat condimentum.?


Proin at sapien vitae enim pretium imperdiet non vitae metus.

Cheers,
Bob








</pre>
<blockquote>
<pre class="v1moz-quote-pre">Hi there,

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse neque augue,
viverra eu pharetra nec, volutpat id sapien. Vestibulum facilisis ligula nisl,
in dictum velit tristique in. Pellentesque sagittis et justo quis pretium. Lorem
ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse potenti.
Pellentesque cursus accumsan urna, eu ultricies ipsum tincidunt sed.

Cheers John
-------------------------

From: "Jack Jackson" <a href="mailto:jack@example.com" rel="noreferrer">&lt;jack@example.com&gt;</a>
To: "John Johnson" <a href="mailto:john@example.com" rel="noreferrer">&lt;john@example.com&gt;</a>
Cc: "Bob Bobson" <a href="mailto:bob@example.com" rel="noreferrer">&lt;bob@example.com&gt;</a>
Sent: Wednesday, 17 June, 2020 13:40:56
Subject: Re: My Email Subject 01

Hello there

Sed consectetur arcu ut facilisis interdum.

Nunc ante libero, faucibus vel ultricies sit amet, gravida nec
leo. Donec euismod risus ac leo efficitur, blandit pretium turpis feugiat.
Nam placerat, lectus quis consectetur malesuada, turpis ante aliquam velit,
in fermentum dolor ligula at eros.

Cheers Jack

Den 17-06-2020 kl. 13:31 skrev <a href="mailto:john@example.com" rel="noreferrer">john@example.com</a>:

</pre>
<blockquote>
<pre class="v1moz-quote-pre">Hi there,

Nunc eget metus eu ex maximus vehicula. Donec pretium ex vel felis condimentum,
eget pretium ante pulvinar. Pellentesque sed eros vitae ante lobortis venenatis
ut in nulla. Praesent a facilisis metus, et dignissim elit. Duis quis dui risus.

Donec et nunc at urna accumsan molestie. Praesent ultricies molestie metus at
venenatis. Curabitur mattis dolor laoreet, porttitor odio id, porttitor nisl.
Suspendisse finibus est ac sem lacinia dignissim. Donec vel pellentesque magna.
Pellentesque a volutpat ante. Vivamus urna mi, aliquet in tortor eget,
malesuada blandit turpis.

Best Regards

John
</pre>
</blockquote>
</blockquote>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>

The best I got from pandoc, was using this command line:

pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_divs-bracketed_spans-smart-escaped_line_breaks test.html -o test.txt

... and in that case, I get this output:

 

 

 

john\@example.com skrev den 17.06.2020 22:30:

> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc at
> blandit velit.
>
>  
>
> ------------------------------------------------------------------------
>
> **From:** john\@example.com
> **To:** "Jack Jackson" \<jack\@example.com\>
> **Cc:** "Bob Bobson" \<bob\@example.com\>, "Fred Fredson"
> \<fred\@example.com\>, "Jim Jimson" \<jim\@example.com\>
> **Sent:** Wednesday, 17 June, 2020 22:24:47
> **Subject:** Re: My Email Subject 01
>
>  
>
> Hi there,
>
>
>
> In hac habitasse platea dictumst. Sed nec purus leo. In est metus,
> tempor quis dapibus id, mollis ut est.
>
>
>
> Morbi eu mauris sodales, sagittis nisl at, euismod quam. Proin mauris
> tortor, viverra eu erat eget, ultrices ornare justo. Donec lacinia
> nisi sit amet dolor semper posuere.
>
>
>
> Pellentesque mattis, nisl quis scelerisque blandit, enim ipsum
> vestibulum diam, eu feugiat metus diam vel nisi. Etiam porttitor nisl
> ut ultrices feugiat.
>
>
>
> Maecenas et neque at ante bibendum tempus vel a est. Sed vehicula urna
> augue, quis rutrum sapien congue et. Nullam ac elit quis metus
> ullamcorper placerat sed quis nunc:
> 1: Aliquam vestibulum lobortis dui, in mattis ipsum euismod sed.
> 2: Phasellus et fringilla tortor.
> 3: Donec diam nunc, aliquet a ultrices nec, interdum at dolor.
> Fusce euismod finibus mi, sed viverra orci pulvinar non. Suspendisse
> in magna ut nunc finibus tempor eget et tortor.
>
>
>
> Nullam sollicitudin sem id nibh placerat pellentesque. Ut fermentum
> pharetra venenatis. Maecenas vehicula, mauris a tincidunt vulputate,
> ante turpis finibus diam, a interdum ex sapien vel lorem. Nunc
> faucibus est eu eleifend venenatis. Mauris sed egestas nisi. Fusce
> suscipit tortor ac ultrices scelerisque. In hac habitasse platea
> dictumst.
>
>
>
> Mauris tempor egestas nibh, a congue sapien tristique at. Lorem ipsum
> dolor sit amet, consectetur adipiscing elit. Nulla quam felis,
> tincidunt eget pretium ut, tempus vel tellus
>
>
>
>
> Etiam tincidunt risus sapien, eget tristique dolor dignissim a.
> Maecenas mi quam, auctor sit amet hendrerit id, finibus in ligula.
>
>
>
> Cheers John
>
>
>
> ------------------------------------------------------------------------
>
> **From:** john\@example.com
> **To:** "Jack Jackson" \<jack\@example.com\>
> **Cc:** "Bob Bobson" \<bob\@example.com\>, "Fred Fredson"
> \<fred\@example.com\>, "Jim Jimson" \<jim\@example.com\>
> **Sent:** Wednesday, 17 June, 2020 21:41:05
> **Subject:** Re: My Email Subject 01
>
>
>
> Phasellus non dolor pharetra turpis viverra varius non vitae lectus.
> Quisque egestas, diam quis viverra fringilla, ex urna consequat
> tortor, vel aliquet arcu purus sit amet nisi.
>
>
>
> ------------------------------------------------------------------------
>
> **From:** "Jack Jackson" \<jack\@example.com\>
> **To:** john\@example.com, "Bob Bobson" \<bob\@example.com\>
> **Cc:** "Fred Fredson" \<fred\@example.com\>, "Jim Jimson"
> \<jim\@example.com\>
> **Sent:** Wednesday, 17 June, 2020 21:34:53
> **Subject:** Re: My Email Subject 01
>
>
>
> Hi there
>
> Suspendisse non nunc feugiat sapien pellentesque eleifend.
> Quisque ipsum elit, volutpat eu mollis ac, hendrerit id ante..
> Proin in nisi mi. Sed in lobortis risus. Donec sit amet ullamcorper
> mi.
>
> Jack
>
>  
>
> Den 17-06-2020 kl. 21:28 skrev <john@example.com>:
>
> > ``` {.v1moz-quote-pre}
> > Hi there,
> >
> > Nulla eget diam nunc. Pellentesque in metus ligula.
> >
> > Donec finibus erat id pharetra faucibus. Maecenas enim dui, semper eleifend nulla molestie, vulputate vulputate est.
> >
> > Nam sit amet elit non dolor rhoncus pulvinar eget eu metus. Vestibulum hendrerit pretium nunc. Nullam diam massa, dictum a velit non, eleifend maximus nulla.
> >
> > Vivamus vel congue nunc. In eget justo a lectus pulvinar facilisis.
> >
> >
> >
> > ----- Original Message -----
> > From: "Bob Bobson" <bob@example.com>
> > To: john@example.com
> > Cc: "Jack Jackson" <jack@example.com>, "Fred Fredson" <fred@example.com>, "Jim Jimson" <jim@example.com>
> > Sent: Wednesday, 17 June, 2020 16:40:54
> > Subject: Re: My Email Subject 01
> >
> > Hi there,
> >
> > Duis lacinia arcu sit amet aliquet sagittis. Integer tempor tortor eu ornare mattis. Morbi condimentum
> > auctor sodales. Maecenas ultrices leo at massa commodo sagittis.
> >
> > Etiam justo est, mollis sed pellentesque quis, convallis nec ipsum. Nunc eget
> > nisl lacinia, ultricies magna ac, rutrum dui. Vestibulum vulputate ut lorem eu bibendum.
> >
> >
> > Vestibulum fermentum turpis est, a pulvinar tortor sollicitudin in. Fusce
> > tempor felis vel sem posuere, ac sodales justo suscipit. Nullam a orci ut ex
> > condimentum porta eget et eros. Maecenas congue erat ut nulla tempor pharetra.
> > Nullam velit quam, venenatis eget neque eget, porttitor consectetur
> > neque. Sed sed neque magna.
> >
> > ænean ornare, diam eget porttitor vestibulum, enim nibh tempus enim,
> > a tempor nisl neque id velit.
> > ```
> >
> > > ``` {.v1moz-quote-pre}
> > > Praesent eget vehicula urna, at vulputate elit. Aenean a ornare justo.
> > > Vivamus
> > > eu nunc consectetur, mattis ex nec, commodo metus.
> > > denne værdi,
> > > Donec consectetur et lectus vel elementum. Pellentesque eget pretium
> > > enim.
> > > ```
> >
> > ``` {.v1moz-quote-pre}
> > Ah - Etiam tempor ultrices nisl, quis malesuada sem blandit vel.
> > Vivamus dignissim felis quis ante volutpat condimentum.?
> >
> >
> > Proin at sapien vitae enim pretium imperdiet non vitae metus.
> >
> > Cheers,
> > Bob
> >
> >
> >
> >
> >
> >
> >
> > ```
> >
> > > ``` {.v1moz-quote-pre}
> > > Hi there,
> > >
> > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse neque augue,
> > > viverra eu pharetra nec, volutpat id sapien. Vestibulum facilisis ligula nisl,
> > > in dictum velit tristique in. Pellentesque sagittis et justo quis pretium. Lorem
> > > ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse potenti.
> > > Pellentesque cursus accumsan urna, eu ultricies ipsum tincidunt sed.
> > >
> > > Cheers John
> > > -------------------------
> > >
> > > From: "Jack Jackson" <jack@example.com>
> > > To: "John Johnson" <john@example.com>
> > > Cc: "Bob Bobson" <bob@example.com>
> > > Sent: Wednesday, 17 June, 2020 13:40:56
> > > Subject: Re: My Email Subject 01
> > >
> > > Hello there
> > >
> > > Sed consectetur arcu ut facilisis interdum.
> > >
> > > Nunc ante libero, faucibus vel ultricies sit amet, gravida nec
> > > leo. Donec euismod risus ac leo efficitur, blandit pretium turpis feugiat.
> > > Nam placerat, lectus quis consectetur malesuada, turpis ante aliquam velit,
> > > in fermentum dolor ligula at eros.
> > >
> > > Cheers Jack
> > >
> > > Den 17-06-2020 kl. 13:31 skrev john@example.com:
> > > ```
> > >
> > > > ``` {.v1moz-quote-pre}
> > > > Hi there,
> > > >
> > > > Nunc eget metus eu ex maximus vehicula. Donec pretium ex vel felis condimentum,
> > > > eget pretium ante pulvinar. Pellentesque sed eros vitae ante lobortis venenatis
> > > > ut in nulla. Praesent a facilisis metus, et dignissim elit. Duis quis dui risus.
> > > >
> > > > Donec et nunc at urna accumsan molestie. Praesent ultricies molestie metus at
> > > > venenatis. Curabitur mattis dolor laoreet, porttitor odio id, porttitor nisl.
> > > > Suspendisse finibus est ac sem lacinia dignissim. Donec vel pellentesque magna.
> > > > Pellentesque a volutpat ante. Vivamus urna mi, aliquet in tortor eget,
> > > > malesuada blandit turpis.
> > > >
> > > > Best Regards
> > > >
> > > > John
> > > > ```

... which is great, because the nested blockquotes are finally preserved as quote characters in the plain text output ...

However, I do not like the formatting remains there, such as ``` {.v1moz-quote-pre} or **From:**, or the escapes like "Jack Jackson" \<jack\@example.com\>.

So can I use some pandoc settings, to obtain a email-like plain-text format out of e-mail HTML - without Markdown-specific formatting or backslash escapes of angle brackets or at @ character, but which preserves quote characters nesting even for <blockquote>, as in the above example?

Failing that, is there some other software (or even an online app/webpage), that could convert e-mail HTML to e-mail plain-text, which preserves nested quotes?


Solution

  • You could use a Lua filter and remove all document components which you do not want to keep. E.g., to remove most inline markup:

    function Inline (inln)
      if inln.content then
        return inln.content
      else
        return inln
      end
    end
    

    Avoiding unwanted escapes should be possible by declaring all words as preformatted Markdown:

    function Str (s)
      return pandoc.RawInline('markdown', s.text)
    end