markdownsgml

SGML parser for Markdown possible?


SGML has many optional features to allow markup minimization, such as optional or implied start and end tags and SHORTREF for simpler aliases of tags. Is it thus possible to write a DTD that a perfect SGML implementation, which has always been a rare to non-existent thing, could use to successfully parse arbitrary markdown documents?

There are differences among existing markdown parsers, which Commonmark tries to standardize away, so there is some leeway in border cases for an SGML-based parser.


Solution

  • While many markdown constructs can be parsed into HTML using SGML short references, markdown's inline and reference links can't.

    Inline links such as [link text](link URL) are problematic since the href attribute of the produced a element must be populated with the link URL as value, which doesn't work at all with SGML short references. Reference links, in addition, require unbounded lookahead, since they can be placed everywhere in text before or after actual usage.

    Another problem is markdown auto-escaping and auto-links.

    Edit: just for your info, sgmljs.net (my project) contains a full markdown (+ common extensions) to HTML translation embedded in an SGML parser, but it merely exposes markdown short reference map declarations "virtually" via a public identifier that "magically" switches on markdown to HTML translation when referenced in a document's prolog; actual markdown translation and processing is hard-coded using JavaScript (see http://sgmljs.net/docs/markdown.html). A problem with using markdown from SGML not mentioned is that markdown wants a "markup block" (HTML block generalized to allow any explicit element tags or other markup constructs) separated by newline(s) from preceding or succeeding markdown text, which is a constraint that cannot be captured in SGML.