xmldtdsgml

Are DTD external unparsed entities and notations used in the wild?


DTDs provide a mechanism for referencing external entities of arbitrary formats, thus allowing SGML and XML files to link to any file with a URI without creating a custom mechanism for that. So, for example, one could specify in a DTD:

<!ELEMENT img EMPTY>
<!ATTLIST img src ENTITY #REQUIRED>
<!NOTATION gif PUBLIC "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN" "image/gif">
<!ENTITY myimg1 SYSTEM "img1.gif" NDATA gif>
<!ENTITY myimg2 SYSTEM "img2.gif" NDATA gif>
<!ENTITY myimg3 SYSTEM "img3.gif" NDATA gif>

When creating an img element, one could then use a value like myimg1 and the application working with the document should be informed that file img1.gif is referenced, with a specific format.

The way I understand it, there are three advantages to this:

Yet, so far I wasn't able to find any dataset or application which would predominantly use this mechanism. In practice, all these points are defeated:

All tutorials about this mechanism I've found simply state what this can be used for (mostly copying paragraphs from other documents) with examples of custom DTDs like the one above. Additionally, since an entity like this can only be included in an attribute, it can never actually be considered a part of the content of any element and its processing is always dependent on the application.

Is there a system using or relying on external entities and notations? Are there applications that recognize entities used this way and are able to understand notations? What kind of public IDs for notations can I use reasonably, and what are some real-world examples of system IDs? And are there common public IDs for entities or notations?


Solution

  • Notations and unparsed entities are notably used by DocBook and TEI.

    They are also used for a general templating/parametric macro expansion mechanism in my SGML software (http://sgmljs.net), much in the spirit of adding features to SGML without new syntax. Specifically, in SGML (but not XML), entity declarations can have data attributes, as in

    <!ENTITY e SYSTEM "..." NDATA sgml [ x=1 y=2 ]>
    

    Support for XLink/XInclude generally is just as spotty or arguably even more than entity/notation declarations given the latter are core SGML/XML constructs (see eg Trying to use XInclude with Java and resolving the fragment with xml:id). The more grave concern with XInclude is that it interacts with schema validation in unintended ways (XInclude Schema/Namespace Validation?) due to it being layered as an XML application/vocabulary rather than a core feature.

    XLink might be nice on paper (I don't think it's even that given that it blindly brings over HyTime concepts without context eg. with extremely vague specification of link roles other than plain HTML-like links). But the reality is that the most common document format out there by far (ie. HTML) makes use of URLs which XML can't reasonably deal with at all given it allows and frequently contains & ampersand characters which XML always wants to interpret as the start of entity references. The WebSGML revision of SGML (created by the authors of the original XML spec along with introducing XML as a standalone subset of SGML to align these two specs) has introduced data specification attributes (explained in http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html) do deal with this problem specifically.

    Update: regarding commonly used public identifiers for notations to use in SGML and XML, there's

    There are also the well-known entity sets for special characters in SGML, HTML, and XML (and specifically MathML).