markdownpandochtml-parser

How can I skip elements which have specific tag on html to markdown conversion?


I don't want to convert some specific elements in HTML to markdown conversion.Let say I don't want to convert svg tags:

Input(html format):

<p><strong>one-to-many</strong> – where the ‘many’ side can be <strong>zero or more</strong> (an optional relationship) or <strong>one or more</strong> (a mandatory relationship).</p>
<svg xmlns="http://www.w3.org/2000/svg" height="248" width="693" viewBox="-197 0 866.2499999999993 309.9999999999998">
        ...
</svg>

Expected Result(markdown format):

**one-to-many** – where the ‘many’ side can be zero or more (an optional relationship) or one or more (a mandatory relationship).
<svg xmlns="http://www.w3.org/2000/svg" height="248" width="693" viewBox="-197 0 866.2499999999993 309.9999999999998">
...
</svg>

I can extract all specific elements with HTML parser then convert the document to markdown and after the conversion put these elements in the desired way to the markdown document but I am wondering is there any way to this with pandoc commands?


Solution

  • You can tell pandoc's HTML reader to include the HTML:

    pandoc -f html+raw_html -t markdown
    

    If you further want to customize pandoc's behaviour, you could write a pandoc filter.