dtdsgml

DTD character data validation error due to linebreak


I have the following fake.dtd file:

<!ELEMENT outer - - (#PCDATA, foo, bar) >
<!ELEMENT foo - o (#PCDATA) >
<!ELEMENT bar - - (#PCDATA) >

And the following SGML document:

<!DOCTYPE outer SYSTEM "fake.dtd">
<OUTER>Document Title
    <FOO>1234
    <BAR>wxyz</BAR>
</OUTER>

I am getting a validation error using nsgmls:

4:19:E: character data is not allowed here

Note that putting </OUTER> on the same line as </BAR> solves the problem; the error refers to the line-break.

Is there a way to keep the SGML as is (because I already have thousands of documents like this), but change the DTD so that it validates?

Adding another #PCDATA to the end of the outer element seems silly because that would make characters other than newline legal.


Solution

  • The SGML Standard (ISO 8879:1986/A1:1988, 11.2.4) explicitly recommends to not use content models like (#PCDATA, foo, bar) (emphasis mine):

    NOTE - It is recommended that ā€œ#PCDATAā€ be used only when data characters are to be permitted anywhere in the content of the element; that is, in a content model where it is the sole token, or where or is the only connector used in any model group.

    Despite mentioning #PCDATA only as the first token in the group, your outer element type still is declared to have mixed content, so data characters can occur anywhere: that's why the line break (aka a "record end") after </BAR> is recognized as a data character instead of just a separator on the one hand, but there's no corresponding #PCDATA token to absorb it on the other hand, hence the error. (And only the omitted </FOO> end-tag circumvented the same error in the line before!)


    The proper and common approach in this case would be to place the "Document Title" into an actual title element—for which one can allow omission of both the start- and end-tag:

    <!ELEMENT outer - - (title, foo, bar) >
    <!ELEMENT title o o (#PCDATA) >
    

    Now

    (The same technique is used in several Standard DTDs, like the "General Document" example in annex E of the Standard.)