javaxmlsaxjdom

SAXBuilder().build(InputStream) - does this read entire file into memory?


Reading the docs, this is the method used in all the examples I've seen:

(Version of org.jdom.input.SAXBuilder is jdom-1.1.jar)

Document doc = new SAXBuilder().build(is);
Element root = doc.getRootElement();
Element child = root.getChild("someChildElement");
...

where is is an InputStream variable.

I'm wondering, since this is a SAX builder (as opposed to a DOM builder), does the entire inputstream get read into the document object with the build method? Or is it working off a lazy load and as long as I request elements with Element.getChildren() or similar functions (stemming from the root node) that are forward-only through the document, then the builder automatically takes care of loading chunks of the stream for me?

I need to be sure I'm not loading the whole file into memory.

Thanks, Mike


Solution

  • The DOM parser similarly to the JDom parser loads the whole XML resource in memory to provide you a Document instance allowing to navigate in the elements of the XML.
    Some references here :

    the DOM standard is a codified standard for an in-memory document model.

    And here :

    JDOM works on the logical in-memory XML tree,

    Both DOM and JDom use the SAX parser internally to read the XML resource but they use it only to store the whole content in the Document instance that they return. Indeed, with Dom and JDom, the client never needs to provide a handler to intercept events triggered by the SAX parser.

    Note that both DOM and JDom don't have any obligation to use SAX internally.
    They use them mainly as the SAX standard is already there and so it makes sense to use it for reporting errors.


    I need to be sure I'm not loading the whole file into memory.

    You have two programming models to work with XML: streaming and the document object model (DOM).
    You are looking for the first one.

    So use the SAX parser by providing your handler to handle events generated by the SAX parser (startDocument(), startElement(), and so for) or as alternative look at a more user friendly API : STAX (Streaming API for XML) :

    As an API in the JAXP family, StAX can be compared, among other APIs, to SAX, TrAX, and JDOM. Of the latter two, StAX is not as powerful or flexible as TrAX or JDOM, but neither does it require as much memory or processor load to be useful, and StAX can, in many cases, outperform the DOM-based APIs. The same arguments outlined above, weighing the cost/benefits of the DOM model versus the streaming model, apply here.