pandoccommonmark

Converting from docx to markdown how to get rid of span underline in links?


Since a recent pandoc update (now I'm at 2.2.1) the links in a docx document are converted to [<span class="underline">graphic novel hero</span>](https://www.amazon.com/exec/obidos/ASIN/1596432594/braipick-20) adding a unneeded span to link labels. Is there any black magic (besides adding a sed call to the pipeline) to get rid of them and returning to pure commonmark?

The pandoc options I use are: pandoc -f docx --atx-headers --wrap=none --extract-media=. -t commonmark-smart myFile.docx

Thanks for clarifying!


Solution

  • If you use -t commonmark the spans that the docx-reader generates are converted to raw HTML, so you could use:

    pandoc -t commonmarkd-raw_html
    

    Alternatively, use the markdown-writer, which is more flexible in terms of extensions (but as of 2018 not yet 100%-commonmark-compliant):

    pandoc -t markdown-bracketed_spans-raw_html-native_spans
    

    See the MANUAL for more details.