javaxmlfilesplitting

XML splitting of BIG file using Java


I'm trying to create a java program that will split the selected XML file.

XML file data sample:

<EmployeeDetails>
<Employee>
<FirstName>Ben</FirstName>
</Employee>
<Employee>
<FirstName>George</FirstName>
</Employee>
<Employee>
<FirstName>Cling</FirstName>
</Employee>
<EmployeeDetails>

And so on, I have this 250mb XML file ant it always pain in the ass to open it external program and manually split it to be able to be readable with the others (not all laptop/desktop can open such large file). So I decided to create a Java Program that will have this function: -Select XML File (already done) -Split file based on # of tags eg.(Current file has 100k of tags I'll ask the program user on how Employee he/she wants for the splitted file. eg. (10k per file) -Split the file ( already done)

I just want to ask for help on how can I possibly do the 2nd task, already in 3-4 days checking on how can I possibly do this or is it even feasible ( in my mind of course it is).

Any response will be appreciated.

Cheers, Grimm.


Solution

  • Assuming a flat structure where the root element of the document R has a large number of children named X, the following XSLT 2.0 transformation will split the file every Nth X element.

    <t:transform xmlns:t="http://www.w3.org/1999/XSL/Transform"
      version="2.0">
      <t:param name="N" select="100"/>
      <t:template match="/*">
        <t:for-each-group select="X" 
                          group-adjacent="(position()-1) idiv $N">
          <t:result-document href="{position()}.xml">
            <R>
              <t:copy-of select="current-group()"/>
            </R>
          </t:result-document>
       </t:for-each-group>
      </t:template>
    </t:transform> 
    

    If you want to run this in streaming mode (without building the source tree in memory), then (a) add <xsl:mode streamable="yes"/>, and (b) run it using an XSLT 3.0 processor (Saxon-EE or Exselt).