haskellpandockatexhakyll

Pandoc 3: Obtain and modify the default context for Markdown to HTML conversion in Hakyll 4?


I am currently playing around with Hakyll and Pandoc.

I want to create a static HTML website from Markdown sources including inline maths in LaTeX. Using pandoc-katex I was able to do the conversion with the following command:

$ pandoc -f markdown -t html --filter pandoc-katex --css "https://cdn.jsdelivr.net/npm/katex@$(pandoc-katex --katex-version)/dist/katex.min.css" --css "https://pandoc.org/demo/pandoc.css" --standalone -o output.html input.md

However, I want to use the pandoc-katex filter in Hakyll and obtain the exact same result as with the command above (for now), i.e. I want to use Pandoc's standard HTML template, make it load the two CSS files and process any available metadata in the input.md in exactly the same way as the command above does.

I exported the standard HTML template as follows:

$ pandoc -D html > default-template.html

Using pandocCompilerWithTransformM, I was able to use the pandoc-katex filter:

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

Using this compiler in Hakyll, I only get the body part of the HTML file though. I searched online for solutions to this, but all the information that I find seems to refer to deprecated versions of Pandoc. Apparently there was a writerStandalone option in earlier versions of Pandoc, but it does not exist anymore (even though the command line tool still has opStandalone and the --standalone parameter used above evidently works).

What I currently do is, I apply the default template with loadAndApplyTemplate "templates/default-template.html" myCtx and then try to manually replicate the default context in myCtx. This is obviously not how it should be done.

Here is a somewhat minimal example of my attempt (sorry that it's still a bit lengthy - exactly that is the problem):

{-# LANGUAGE OverloadedStrings #-}

import Text.Pandoc
import Text.Pandoc.Filter
import Text.Pandoc.Scripting
import Hakyll

css1Item   = Item (fromFilePath "css/katex.min.css") "https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css" 
css2Item   = Item (fromFilePath "css/pandoc.css") "https://pandoc.org/demo/pandoc.css" 
authorItem = Item (fromFilePath "general") "Jon Doe"

stylesString = "/* 15 lines of CSS */"

myCtx :: Context String
myCtx = dateField "date" "%B %e, %Y"
       <> constField "pagetitle" "My Title"
       <> constField "styles.html" stylesString
       <> listCtx "author" [authorItem]
       <> listCtx "author-meta" [authorItem]
       <> listCtx "css" [css1Item, css2Item]
       <> listCtx "header-includes" []
       <> listCtx "include-before" []
       <> listCtx "include-after" []
       <> defaultContext

listCtx :: String -> [Item String] -> Context String
listCtx name lst = listField name ctx (return $ lst)
  where ctx = field name (return . itemBody)

katexCompiler = pandocCompilerWithTransformM defaultHakyllReaderOptions (defaultHakyllWriterOptions) katexFilter
  where katexFilter = recompilingUnsafeCompiler
      . runIOorExplode
      . applyFilters noEngine def [JSONFilter "pandoc-katex"] []

main :: IO ()
main = hakyll $ do
  match "templates/default-template.html" $ compile templateBodyCompiler
  match "input.md" $ do
      route   $ setExtension ".html"
      compile $ katexCompiler
                >>= loadAndApplyTemplate "templates/default-template.html" myCtx

I have a two concrete questions:

Apart from these concrete questions, the general question is:


Solution

  • The nicest way to do that is probably using writerTemplate in Pandoc's WriterOptions to pass the default template, as given by compileDefaultTemplate:

    main :: IO ()
    main = do
        pandocTmpl <- runIOorExplode $ compileDefaultTemplate "html"
        let katexOpts = defaultHakyllWriterOptions
                { writerTemplate = Just pandocTmpl
                , writerHTMLMathMethod = KaTeX ""
                -- And whatever else you need.
                }
            -- Defining it this way because pandocCompilerWith strips
            -- the metadata block before handing the body to Pandoc.
            --
            -- I'm relying on Pandoc's built-in KaTeX support. If
            -- you'd rather stick with the pandoc-katex filter, you
            -- can use renderPandocWithTransformM to reshape the
            -- compiler you defined in the question in this fashion.
            katexCompiler = do
                fullItem <- getResourceString
                renderPandocWith defaultHakyllReaderOptions katexOpts fullItem
    
        hakyll $ do
            -- etc.
            match "input.md" $ do
                route $ setExtension ".html"
                compile katexCompiler
    

    See also pandoc issue #10209, which points to a similar approach.


    Side questions:

    How should I think of these Items?

    Item indeed is primarily meant for things bound to a file path in your site tree. Occasionally, it makes sense to use a fake path for the identifier — for instance, when synthesising some content with a create rule. However, that's not typically something one would want to do for the sake of setting a context field, as there likely are more straightforward ways to do that. (In particular, if, unlike in this answer, you are using Hakyll's templates, you don't have to explicitly define the fields that you include in the metadata headers of your source files, as Hakyll's defaultContext covers that already by including metadataField.)

    Is there a way to obtain the Context that the command line tool uses, when making the conversion?

    While Pandoc offers ways to manipulate its own metadata (which I have never used myself; Text.Pandoc.Writers.Shared might be a good place to start browsing), the template systems of Pandoc and Hakyll are similar-looking but distinct, and in particular Hakyll's Context type is not the same as its Pandoc counterpart.


    On a final note, it is worth mentioning that if you were completely stuck trying to reproduce Pandoc's output within Hakyll, a last resort would be using unixFilter to set up a compiler that shells out to command-line Pandoc.