xml-parsingdtddtd-parsingxml

DTD XML parsing


If I have:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE country[
<!ELEMENT country
(president | king | (king,queen) | queen)>
<!ELEMENT president (#PCDATA)>
<!ELEMENT king (#PCDATA)>
<!ELEMENT queen (#PCDATA)>
]>

Why (president | king | (king,queen) | queen)> generate the following error if we try to validate <country><king>Luis</king></country> we get the error message [...]Both 1st and 2nd occurence of "king" are possible. What if I write: (president | (king) | (king,queen) | queen)> ?


Solution

  • It's because your content model is non-deterministic. This means that given the king element, the parser cannot determine which model is being matched without looking ahead. See Deterministic Content Models (Non-Normative) for more details.

    What I would do is make queen optional when a king is present:

    <!ELEMENT country (president | (king,queen?) | queen)>
    

    Response to comment...

    The XML processor cannot use "look ahead" in order to figure out what is gonna "happen" after matching "king", right?

    Right. For example, lets say we have this country element:

    <country>
      <king/>
    </country>
    

    and we declare country like this in our DTD:

    <!ELEMENT country (president | king | (king,queen) | queen)>
    

    there are 4 possible options for the content of country:

    1. one "president"
    2. one "king"
    3. one "king" followed by one "queen"
    4. one "queen"

    So if we have a king element in our XML, the parser doesn't know if it is option #2 or option #3.

    If we declare country like this:

    <!ELEMENT country (president | (king,queen?) | queen)>
    

    there are 3 possible options for the content of country:

    1. one "president"
    2. one "king" followed by zero or one "queen"
    3. one "queen"

    As you can see, if we have a king element in our XML there is only one possible option that the parser can choose.