javaxmlxml-parsingwoodstox

Error Parsing '&' Character Using Woodstox Parser


Java: 1.6
Woodstox: 4.1.4

I'm currently trying to make Woodstox xml parser my friend. But beginning is really hard :) I have a small? problem when parsing xml like this one:

<teams>
    <team id="team1">Mom & Dad</team>
    <team id="team2">Son & Daughter</team>
</teams>

It is simple, but unfortunately I'm getting this exception:

Exception in thread "main" [com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' ' (code 32) (missing name?)
 at [row,col {unknown-source}]: [2,24]

This happens because of character &.

Is it possible to read xml successfully without getting this exception?


Solution

  • & is an invalid character and should appear escaped as &amp; or enclosed in a CDATA section.

    <teams>
        <team id="team1">Mom &amp; Dad</team>
        <team id="team2"><![CDATA[Son & Daughter]]></team>
    </teams>
    

    From: http://www.w3.org/TR/REC-xml/#syntax

    The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively.