javaxmlxml-parsingsax

Parse a list of XML fragments with no root element from a stream input


Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?

I tried parsing such an XML but got a

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

before even the endDocument event was fired.

I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing".

I am using the standard SAX API of Java 1.6. The SAX factory had setValidating(false) in case anyone wondered.


Solution

  • First, and most important of all, the content you are parsing is not an XML document. From the XML Specification:

    [Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.]

    Now, as to parsing this with SAX - in spite of what you said about clumsiness - I'd suggest the following approach:

    Enumeration<InputStream> streams = Collections.enumeration(
        Arrays.asList(new InputStream[] {
            new ByteArrayInputStream("<root>".getBytes()),
            yourXmlLikeStream,
            new ByteArrayInputStream("</root>".getBytes()),
        }));
    
    SequenceInputStream seqStream = new SequenceInputStream(streams);
    
    // Now pass the `seqStream` into the SAX parser.
    

    Using the SequenceInputStream is a convenient way of concatenating multiple input streams into a single stream. They will be read in the order they are passed to the constructor (or in this case - returned by the Enumeration).

    Pass it to your SAX parser, and you are done.