javaxmldomjaxbxmlbeans

Java XML parsing DOM performance


I'm part of a team creating a data store that passes information around in large XML documents (herein called messages). On the back end, the messages get shredded apart and stored in accumulo in pieces. When a caller requests data, the pieces get reassembled into a message tailored for the caller. The schemas are somewhat complicated so we couldn't use JAXB out of the box. The team (this is a few years ago) assumed that DOM wasn't performant. We're now buried in layer after layer of half-broken parsing code that will take months to finish, will break the second someone changes the schema, and is making me want to jam a soldering iron into my eyeball. As far as I can tell, if we switch to using the DOM method a lot of this fart code can be cut and the code base will be more resilient to future changes. My team lead is telling me that there's a performance hit in using the DOM, but I can't find any data that validates that assumption that isn't from 2006 or earlier.

Is parsing large XML documents via DOM still sufficiently slow to warrant all the pain that XMLBeans is causing us?

edit 1 In response to some of your comments:

1) This is a government project so I can't get rid of the XML part (as much as I really want to).

2) The issue with JAXB, as I understand it, had to do with the substitution groups present in our schemas. Also, maybe I should restate the issue with JAXB being one of the ratio of effort/return in using it.

3) What I'm looking for is some kind of recent data supporting/disproving the contention that using XMLBeans is worth the pain we're going through writing a bazillion lines of brittle binding code because it gives us an edge in terms of performance. Something like Joox looks so much easier to deal with, and I'm pretty sure we can still validate the result after the server has reassembled a shredded message before sending it back to the caller.

So does anyone out there in SO land know of any data germane to this issue that's no more than five years old?


Solution

  • If you want the best technology for heavy duty XML processing, you might want to investigate this paper. The best technology will no doubt be clear after you read it...

    The paper details :

    Processing XML with Java – A Performance Benchmark
    Bruno Oliveira1 ,Vasco Santos1  and Orlando Belo2 1 CIICESI,
    School of Management and Technology,
    Polytechnic of Porto Felgueiras, PORTUGAL
    2 Algoritmi R&D Centre, University of Minho
    4710-057 Braga, PORTUGAL