javaxml-parsingxom

XOM Parser Out of heap memory


I am parsing files using XOM library. The Java application works well but I meet a memory problem when I parse large files more than 200 MBs.

I face a heap size memory when I build the file using the below piece of code

        Builder profileFileBuilder = new Builder(profileFileXMLReader);
        Document profileFileDocument = profileFileBuilder.build(profileFile);

What are my alternatives to build files with that size?. I tried to allocate more memory to the JVM but it doesn't accept more than 1024 MBs

Thank you in advance


Solution

  • You can use XOM as a streaming parser by extending the NodeFactory so that it doesn't keep the XML in memory, but processes it and then forgets about it. This works well for well for XML that has many smaller nodes wrapped in a container element. For instance, XML like:

    <records>
      <record><a_little_xml/></record>
      <record><a_little_xml/></record>
      <record><a_little_xml/></record>
      <record><a_little_xml/></record>
      <record><a_little_xml/></record>
    </records>
    

    There is an example in the XOM documentation of how to extend the NodeFactory: http://www.xom.nu/tutorial.xhtml#Lister

    You basically parse the content (at whatever level in the document you are interested in) and then don't add it to the in-memory tree: http://www.xom.nu/tutorial.xhtml#d0e1424