I found a bug (I think) using the 2.13.4 version of vtd-xml. Well, in short I have the following snippet code:
String test = "<catalog><description></description></catalog>";
VTDGen vg = new VTDGen();
vg.setDoc(test.getBytes("UTF-8"));
vg.parse(true);
VTDNav vn = vg.getNav();
//get nodes with no childs, text and attributes
String xpath = "/catalog//*[not(child::node()) and not(child::text()) and count(@*)=0]";
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath(xpath);
//block inside while is never executed
while(ap.evalXPath()!=-1) {
System.out.println("current node "+vn.toRawString(vn.getCurrentIndex()));
}
and this doesn't work (=do not find any node, while it should find "description" instead). The code above works if I use the self closed tag:
String test = "<catalog><description/></catalog>";
The point is every xpath evaluator works with both version of the xml. Sadly I receive the xml from an external source, so I have no power over it... Breaking the xpath I noticed that evaluating both
/catalog//*[not(child::node())]
and
/catalog//*[not(child::text())]
give false as result. As additional bit I tried something like:
String xpath = "/catalog/description/text()";
ap.selectXpath(xpath);
if(ap.evalXPath()!=-1)
System.out.println(vn.toRawString(vn.getCurrentIndex()));
And this print empty space, so in some way VTD "thinks" the node has text, even empty but still, while I expect a no match. Any hint?
When I faced this issue, I was left mainly with three options (see below). I went for the second option : Use XMLModifier to fix the VTDNav. At the bottom of my answser, you'll find an implementation of this option and a sample output.
I faced the same issue. Here are the main three options I first thought of (by order of difficulty) :
This option isn't always possible (like in OP case). Moreover, it may be difficult to "pre-process" the xml before hand.
Find the empty elements with an xpath expression, replace them with self closed tags and rebuild the VTDNav.
A lower level variant of the preceding solution would consist in looping over the tokens in VTDNav and remove unecessary tokens thanks to XMLModifier#removeToken.
Taking this path may require more effort and more time. IMO, the optimized vtd-xml code isn't easy to grasp at first sight.
Option 1 wasn't feasible in my case. I failed implementing Option 2bis. The "unecessary" tokens still remained. I didn't look at Option 3 because I didn't want to fix some (rather complex) third party code.
I was left with Option 2. Here is an implementation:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.ximpleware.AutoPilot;
import com.ximpleware.NavException;
import com.ximpleware.VTDException;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
import com.ximpleware.XMLModifier;
@Test
public void turnEmptyElementsIntoSelfClosedTags() throws VTDException, IOException {
// STEP 1 : Load XML into VTDNav
// * Convert the initial xml code into a byte array
String xml = "<root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>";
byte[] ba = xml.getBytes(StandardCharsets.UTF_8);
// * Build VTDNav and dump it to screen
VTDGen vg = new VTDGen();
vg.setDoc(ba);
vg.parse(false); // Use `true' to activate namespace support
VTDNav nav = vg.getNav();
dump("BEFORE", nav);
// STEP 2 : Prepare to fix the VTDNAv
// * Prepare an autopilot to find empty elements
AutoPilot ap = new AutoPilot(nav);
ap.selectXPath("//*[count(child::node())=1][text()='']");
// * Prepare a simple regex matcher to create self closed tags
Matcher elementReducer = Pattern.compile("^<(.+)></.+>$").matcher("");
// STEP 3 : Fix the VTDNAv
// * Instanciate an XMLModifier on the VTDNav
XMLModifier xm = new XMLModifier(nav);
ByteArrayOutputStream baos = new ByteArrayOutputStream(); // baos will hold the elements to fix
String utf8 = StandardCharsets.UTF_8.name();
// * Find all empty elements and replace them
while (ap.evalXPath() != -1) {
nav.dumpFragment(baos);
String emptyElementXml = baos.toString(utf8);
String selfClosingTagXml = elementReducer.reset(emptyElementXml).replaceFirst("<$1/>");
xm.remove();
xm.insertAfterElement(selfClosingTagXml);
baos.reset();
}
// * Rebuild VTDNav and dump it to screen
nav = xm.outputAndReparse(); // You MUST call this method to save all your changes
dump("AFTER", nav);
}
private void dump(String msg,VTDNav nav) throws NavException, IOException {
System.out.print(msg + ":\n ");
nav.dumpFragment(System.out);
System.out.print("\n\n");
}
BEFORE:
<root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>
AFTER:
<root><empty-element/><self-closed/><empty-element2 foo='bar'/></root>