XSD complex types are, well... complex
Some facts first:
Because of all that, I think this is possible in an xsd/xml (pseudo-xsd):
<someComplexType>
<sequence minOccurs=0>
<element A>
<element B>
</sequence>
<sequence>
<element A>
<element C>
</sequence>
</someComplexType>
Not sure it is legal, but it is possible.
Now, let's say a sax/stax parser sees an element A.
What happens next? How should I determine if it is an A from the first sequence or an A from the second sequence...before checking the next element(s)?
If the next element is B, this A should probably(!) be a match for the first sequence.
If the next element is C, the same A should probably(!) be a match for the second sequence.
("Probably" because the whole thing above can be an optional nested part of another sequence which makes me see this as a recursive issue, it might occur at any depth and to make a decision one might have to check multiple elements following, at multiple depths)
With a similar xsd, while parsing the xml it is (I think) impossible to tell if the current element belongs to a specific compositor or not, until....later, not even sure when.
Anyway, here are my two questions:
ps: trying to create an xml mapping tool (don't ask why) and I don't know how to match the parsed contents with the xsd structure, under such circumstances.
Thanks a lot
As any XSD processor will tell you, your content model is not allowed because it is ambiguous, for exactly the reasons you describe. (It violates the UPA constraint). The content can (and must) be changed from (AB | AC)
to A (B|C)
.
Of course, there are similar technologies (such as regular expression processors, or RelaxNG) where ambiguous grammars are permitted: parsing them requires techniques such as look-ahead or backtracking.
If you're planning to implement a processor that does this task, I would recommend reading up on some of the computer science. A standard textbook - very old, but still perfectly usable - is the "Dragon book" by Aho and Ullmann (can't remember the title, and I left my copy in the office). More specifically regarding XSD, there are a couple of good papers by Thompson and Tobin which can be found on Henry Thompson's web site at the University of Edinburgh.