javaxmlparsingvtd-xml

How to parse xml using vtd-xml without hardcoding attr name?


This is my sample XML file which is actually above 2gb. Using vtd-xml I have achieved this much:

Current Code:

https://gist.github.com/shadow-fox/21d1d4f30cbed0909f403c3ac0e1fa4d

public void reader() throws IOException, ParseException, NavException, XPathParseExceptionHuge, NavExceptionHuge,
            XPathEvalExceptionHuge {
        VTDGenHuge vg = new VTDGenHuge();
    if (vg.parseFile("sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
        VTDNavHuge vnh = vg.getNav();
        AutoPilotHuge aph = new AutoPilotHuge(vnh);
        aph.selectElementNS("*", "*");
        int i = 0;
        while (aph.iterate()) {
            int t = vnh.getText();
            if (t != -1) {
                System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
                i++;
            }
        }
    }
}

Current result:

PVAL|||298374234
PVAL|||1231
PVAL|||brown
PVAL|||medium
PVAL|||7
PVAL|||solid
PVAL|||brown

What I want:

Sku_ID|||298374234
LotNum|||1231
COLOR|||brown
WIDTH|||medium
SIZE|||7
Pattern|||solid
Color Family|||brown

Sample xml:

<?xml version="1.0" encoding="UTF-8" ?>
<RECORDS>
  <RECORD>
    <PROP NAME="Sku_ID">
      <PVAL>298374234</PVAL>
    </PROP>
    <PROP NAME="LotNum">
      <PVAL>1231</PVAL>
    </PROP>
    <PROP NAME="COLOR">
      <PVAL>brown</PVAL>
    </PROP>
    <PROP NAME="WIDTH">
      <PVAL>medium</PVAL>
    </PROP>
    <PROP NAME="SIZE">
      <PVAL>7</PVAL>
    </PROP>
    <PROP NAME="Pattern">
      <PVAL>solid</PVAL>
    </PROP>
    <PROP NAME="Color Family">
      <PVAL>brown</PVAL>
    </PROP>
  </RECORD>
</RECORDS>

And I don't want to hard code the attr name. I want to retrieve them as I visit them. How would I do this?


Solution

  • Below is my edit of your code to print out attr names and values... it is xpath based...

    public static void main(String s[]) throws Exception{
         VTDGenHuge vg = new VTDGenHuge();
            if (vg.parseFile("d:\\xml\\sku_extract_main.xml",true,VTDGenHuge.MEM_MAPPED)) {
                VTDNavHuge vnh = vg.getNav();
                AutoPilotHuge aph = new AutoPilotHuge(vnh);
                AutoPilotHuge aph2 = new AutoPilotHuge(vnh);
                aph.selectElementNS("*", "*");
                aph2.selectXPath("@*");
                int i = 0;
                while (aph.iterate()) {
                    System.out.println(vnh.toString(vnh.getCurrentIndex()));
                    int t = vnh.getText();
                    if (t != -1) {
                        System.out.println(vnh.toString(vnh.getCurrentIndex()) + "|||" + vnh.toNormalizedString(t));
                        i++;
                    }
    
    // below is the my addition
    // it basically evaluates the attribute axis
    // push pop ensure that the node iteration of the outer while loop
    // is consistent
    // resetXPath is key here, without it, xpath will not work except for the
    // first node returned by aph.iterate()
                    vnh.push();
    
                    while((i=aph2.evalXPath())!=-1){
                        System.out.println(" attr name "+vnh.toString(i));
                        System.out.println("attr val   "+vnh.toString(i+1));
                    }
                    aph2.resetXPath();
                    vnh.pop();
                }
    }