shellmarkdownpandocepub

How can I remove pandoc EPUB links in markdown conversion?


I am using the following shell command to convert from EPUB to Markdown,

pandoc "input.epub" -t gfm-raw_html -o output.md 

It's working well, except for the fact that EPUB links remain, like:

[About the Author](#part0000_split_000.html_x9780698161863_EPUB)

Is there a pandoc option to remove those ?


Solution

  • The documentation provides example code to do precisely this:

    What if we want to remove every link from a document, retaining the link’s text?

    #!/usr/bin/env runhaskell
    -- delink.hs
    import Text.Pandoc.JSON
    
    main = toJSONFilter delink
    
    delink :: Inline -> [Inline]
    delink (Link _ txt _) = txt
    delink x              = [x]
    

    which can be used like:

    pandoc "input.epub" -F delink.hs -t gfm-raw_html -o output.md