DTDs provide a mechanism for referencing external entities of arbitrary formats, thus allowing SGML and XML files to link to any file with a URI without creating a custom mechanism for that. So, for example, one could specify in a DTD:
<!ELEMENT img EMPTY>
<!ATTLIST img src ENTITY #REQUIRED>
<!NOTATION gif PUBLIC "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN" "image/gif">
<!ENTITY myimg1 SYSTEM "img1.gif" NDATA gif>
<!ENTITY myimg2 SYSTEM "img2.gif" NDATA gif>
<!ENTITY myimg3 SYSTEM "img3.gif" NDATA gif>
When creating an img
element, one could then use a value like myimg1
and the application working with the document should be informed that file img1.gif
is referenced, with a specific format.
The way I understand it, there are three advantages to this:
Yet, so far I wasn't able to find any dataset or application which would predominantly use this mechanism. In practice, all these points are defeated:
anyURI
so links can be still automatically found (there is a difference between embedding and linking to a resource though).All tutorials about this mechanism I've found simply state what this can be used for (mostly copying paragraphs from other documents) with examples of custom DTDs like the one above. Additionally, since an entity like this can only be included in an attribute, it can never actually be considered a part of the content of any element and its processing is always dependent on the application.
Is there a system using or relying on external entities and notations? Are there applications that recognize entities used this way and are able to understand notations? What kind of public IDs for notations can I use reasonably, and what are some real-world examples of system IDs? And are there common public IDs for entities or notations?
Notations and unparsed entities are notably used by DocBook and TEI.
They are also used for a general templating/parametric macro expansion mechanism in my SGML software (http://sgmljs.net), much in the spirit of adding features to SGML without new syntax. Specifically, in SGML (but not XML), entity declarations can have data attributes, as in
<!ENTITY e SYSTEM "..." NDATA sgml [ x=1 y=2 ]>
Support for XLink/XInclude generally is just as spotty or arguably even more than entity/notation declarations given the latter are core SGML/XML constructs (see eg Trying to use XInclude with Java and resolving the fragment with xml:id). The more grave concern with XInclude is that it interacts with schema validation in unintended ways (XInclude Schema/Namespace Validation?) due to it being layered as an XML application/vocabulary rather than a core feature.
XLink might be nice on paper (I don't think it's even that given that it blindly brings over HyTime concepts without context eg. with extremely vague specification of link roles other than plain HTML-like links). But the reality is that the most common document format out there by far (ie. HTML) makes use of URLs which XML can't reasonably deal with at all given it allows and frequently contains &
ampersand characters which XML always wants to interpret as the start of entity references. The WebSGML revision of SGML (created by the authors of the original XML spec along with introducing XML as a standalone subset of SGML to align these two specs) has introduced data specification attributes (explained in http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tutorial.html) do deal with this problem specifically.
Update: regarding commonly used public identifiers for notations to use in SGML and XML, there's
the historic, withdrawn ISO/IEC 9070 spec and the identifiers it defines (see http://xml.coverpages.org/wg4-n1990.html)
the older ISO HTML 4 spec (ISO/IEC 15445) assigning alternate public identifiers for (ISO) HTML as opposed to the well-known ones for W3C HTML 4 (see http://www.cs.tcd.ie/misc/15445/15445.dtd)
the storage notation identifiers of ISO/IEC 10744 (HyTime 2nd ed), though these really are only for use in formal system identifiers (see eg http://sgmljs.net/docs/sgmlrefman.html#identifiers for an explanation), among them a convention for defining a notation for an external program to be used as viewer app via MIME/IANA media type associations
a convention to establish new identifiers in ISO/IEC 8879:1986 Technical Corrigendum 2 (aka WebSGML aka Annex K) delegating formation of unique identifiers to domain name resolution; for example +//IDN www.someisp.net/users/mtb
refers to the notation whose spec document lives at the canonical location http://www.someisp.net/users/mtb
There are also the well-known entity sets for special characters in SGML, HTML, and XML (and specifically MathML).