pythonmarkdownpandocplantuml

How to get captions with markdown in pandocfilters?


I tried the PlantUML filter to generate LaTeX figures from PlantUML code in markdown source. It works nicely (I changed it to generate PDF for LaTeX since it preserves the text items in the PlantUML diagrams).

The trouble with this filter (and all the filters using the pandocfilters API) is that captions don't support markdown. That is, passing caption="Here is a diagram that is *not* what you'd expect." will result in a figure in LaTeX that has *not* as opposed to not (in italics).

My workaround is to add two keys to the filter: hide-image=true and plantuml-filename=foo.pdf (the logic is not to return anything in the AST for the diagram and to create an output file foo.pdf). Then, I can get markdown formatting of a caption by using a traditional figure:

```{.plantuml hide-image=true plantuml-filename=foo.pdf}
@startuml
A -> B : hello
@enduml
```

![Here is a diagram that is *not* what you'd expect](foo.pdf)

This works well, but defining the filename is extra work.

get_caption in pandocfilters.py is like this:

def get_caption(kv):
    """get caption from the keyvalues (options)
    Example:
      if key == 'CodeBlock':
        [[ident, classes, keyvals], code] = value
        caption, typef, keyvals = get_caption(keyvals)
        ...
        return Para([Image([ident, [], keyvals], caption, [filename, typef])])
    """
    caption = []
    typef = ""
    value, res = get_value(kv, u"caption")
    if value is not None:
        caption = [Str(value)]
        typef = "fig:"

    return caption, typef, res

Is there a (simple) way to modify this so get_caption can respect markdown inside?

Inline (which I thought might be a way to specify that a caption is containing markdown) isn't a constructor defined in pandocfilters.py, perhaps because where the filter is called in the processing, it's not assumed to be nested.

My (hacked) version of the PlantUML filter is on GitHub:

#!/usr/bin/env python

"""
Pandoc filter to process code blocks with class "plantuml" into
plant-generated images.

Needs `plantuml.jar` from http://plantuml.com/.
"""

import os
import shutil
import sys
from subprocess import call

from pandocfilters import toJSONFilter, Para, Image, get_filename4code, get_caption, get_extension


def plantuml(key, value, format, _):
    if key == 'CodeBlock':
        [[ident, classes, keyvals], code] = value

        if "plantuml" in classes:
            caption, typef, keyvals = get_caption(keyvals)

            filename = get_filename4code("plantuml", code)
            filetype = get_extension(format, "png", html="svg", latex="pdf")

            src = filename + '.puml'
            plantuml_output = filename + '.' + filetype

            dest_spec = ""
            # Key to specify final destination the file
            for ind, keyval in enumerate(keyvals):
                if keyval[0] == 'plantuml-filename':
                    dest_spec = keyval[1]
                    keyvals.pop(ind)
                    break

            # Generate image only once
            if not os.path.isfile(plantuml_output):
                txt = code.encode(sys.getfilesystemencoding())
                if not txt.startswith("@start"):
                    txt = "@startuml\n" + txt + "\n@enduml\n"
                with open(src, "w") as f:
                    f.write(txt)
                # Must not let messages go to stdout, as it will corrupt JSON in filter
                with open('plantUMLErrors.log', "w") as log_file:
                    call(["java", "-jar", "filters/plantuml/plantuml.jar", "-t"+filetype, src], stdout=log_file)
                sys.stderr.write('Created image ' + plantuml_output + '\n')
                if not dest_spec == "": 
                    sys.stderr.write('Copying image from ' + plantuml_output + ' to ' + dest_spec + '\n')
                    shutil.copy2(plantuml_output, dest_spec)
                    plantuml_output = dest_spec


            for ind, keyval in enumerate(keyvals):
                if keyval[0] == 'hide-image':
                    if keyval[1] == 'true':
                        sys.stderr.write('Not showing image ' + plantuml_output + '\n')
                        return [] # surpress image in JSON

            return Para([Image([ident, [], keyvals], caption, [plantuml_output, typef])])

if __name__ == "__main__":
    toJSONFilter(plantuml)

Solution

  • The Lua version of a PlantUML filter just works, at least when I moved my project to use quarto.

    Edit: The filter migrated to its own repo: https://github.com/pandoc-ext/diagram

    I don't have to worry about file names, either, if I use latexmk as the PDF engine, and standalone for html format.