I would like to get an xml structor like:
<root>
allow<back>no#PCDATA</back>
allow<front>allow#PCDATA</front>
</root>
I have:
<!ELEMENT root (back?,front?)>
<!ELEMENT back (js*)>
<!ELEMENT front (para*)>
Using XML DTDs, the best you can get is
<!DOCTYPE root [
<!ELEMENT root (#PCDATA|back|front)*>
<!ELEMENT back (js*)>
<!ELEMENT front (#PCDATA|para)*>
]>
<root>
allow<back><!-- no#PCDATA --></back>
allow<front>allow#PCDATA</front>
</root>
since XML DTDs places restrictions on how the #PCDATA
content token can be used; namely, that it has to be part of a choice group (specifically, it must be the first part of a group of elements separated by the |
connector) according to the XML specification.
You can check this example using Libxml2 (the xmllint --valid
command line utility).
SGML, on the other hand, on which XML is based, and of which XML DTD is designed to be a subset, doesn't have this restriction and allows #PCDATA
to occur multiple times:
<!DOCTYPE root [
<!-- NOTE: this is SGML not XML -->
<!ELEMENT root - - (#PCDATA,((back,#PCDATA,front?)|(front?)))>
<!ELEMENT back - - (js*)>
<!ELEMENT front - - (#PCDATA|para)*>
]>
<root>
allow<back><!-- no#PCDATA --></back>
allow<front>allow#PCDATA</front></root>
You can check these SGML examples using OpenSP (the osgmlnorm
command line utility) or sgmljs (the sgmlproc
command line utility). However, there are restrictions with SGML in this context as well:
you will have noticed that the </root>
end-element tag is put at the end of the line; this is because SGML would interpret a newline as character data unless it occurs after a line containing only a single element with start- and end-element tags in which case it considers that newline as solely for formatting purposes
a content model such as (#PCDATA,back?,#PCDATA,front?)
isn't unambiguous and thus disallowed because if the optional back
element isn't present, text content could be attributed to either of the two #PCDATA
tokens