htmlluamarkdownpandocxhtml-transitional

Replace HTML tags using Pandoc Lua filter when converting from markdown to HTML


I have a markdown file that contains some HTML tags in it and one in particular, the <br> tag I would like to replace when converting to HTML using pandoc. The issue is I would like to replace it with <br /> due to some compatibility issues with some older renderers that complain about <br>. I did try the following Lua filter when running the conversion but it did not do anything:

filter.lua:

function LineBreak (elem)
    return {
        pandoc.RawInline('html', '<br />')
    }
end

I'm using Pandoc version 2.13 running the following command with the following test file:

Test.md:

## Testing

Hello <br> World!

pandoc --lua-filter filter.lua --to html5 Test.md

I have also tried specifying --to html4 but the output is still the same. Is there a way to do this with Lua filters?


Solution

  • To debug this, we can first run pandoc --to=native Test.md to see how the input is parsed into pandoc's internal document representation. This yields

    [Header 2 ("testing",[],[]) [Str "Testing"]
    ,Para [Str "Hello",Space,RawInline (Format "html") "<br>",Space,Str "World!"]]
    

    The interesting part is that <br> is parsed as RawInline (Format "html") "<br>", not as a linebreak. So we can modify the filter to match on that:

    function RawInline (raw)
      if raw.format == 'html' and raw.text == '<br>' then
        return pandoc.RawInline('html', '<br />')
      end
    end
    

    This gives the desired result:

    $ pandoc --lua-filter filter.lua --to html5 Test.md
    <h2 id="testing">Testing</h2>
    <p>Hello <br /> World!</p>