xmldtd

How do I allow #PCDATA without forcing it and how do I deny #PCDATA?


I would like to get an xml structor like:

<root>
    allow<back>no#PCDATA</back>
    allow<front>allow#PCDATA</front>
</root>

I have:

<!ELEMENT root (back?,front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>

Solution

  • Using XML DTDs, the best you can get is

    <!DOCTYPE root [
      <!ELEMENT root (#PCDATA|back|front)*>
      <!ELEMENT back (js*)>
      <!ELEMENT front (#PCDATA|para)*>
    ]>
    <root>
      allow<back><!-- no#PCDATA --></back>
      allow<front>allow#PCDATA</front>
    </root>
    

    since XML DTDs places restrictions on how the #PCDATA content token can be used; namely, that it has to be part of a choice group (specifically, it must be the first part of a group of elements separated by the | connector) according to the XML specification.

    You can check this example using Libxml2 (the xmllint --valid command line utility).

    SGML, on the other hand, on which XML is based, and of which XML DTD is designed to be a subset, doesn't have this restriction and allows #PCDATA to occur multiple times:

    <!DOCTYPE root [
      <!-- NOTE: this is SGML not XML -->
      <!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))>
      <!ELEMENT back - - (js*)>
      <!ELEMENT front - - (#PCDATA|para)*>
    ]>
    <root>
      allow<back><!-- no#PCDATA --></back>
      allow<front>allow#PCDATA</front></root>
    

    You can check these SGML examples using OpenSP (the osgmlnorm command line utility) or sgmljs (the sgmlproc command line utility). However, there are restrictions with SGML in this context as well: