javaxmlxpathjaxp

Remove Empty Attributes from XML


I have a buggy xml that contains empty attributes and I have a parser that coughs on empty attributes. I have no control over the generation of the xml nor over the parser that coughs on empty attrs. So what I want to do is a pre-processing step that simply removes all empty attributes.

I have managed to find the empty attributes, but now I don't know how to remove them:

   XPathFactory xpf = XPathFactory.newInstance();
   XPath xpath = xpf.newXPath();
   XPathExpression expr = xpath.compile("//@*");
   Object result = expr.evaluate(d, XPathConstants.NODESET);

   if (result != null) {
    NodeList nodes = (NodeList) result;
    for(int node=0;node<nodes.getLength();node++)
    {
     Node n = nodes.item(node);
     if(isEmpty(n.getTextContent()))
     {
      this.log.warn("Found empty attribute declaration "+n.toString());
      NamedNodeMap parentAttrs = n.getParentNode().getAttributes();
      parentAttrs.removeNamedItem(n.getNodeName());
     }
    }

   } 

This code gives me a NPE when accessing n.getParentNode().getAttributes(). But how can I remove the empty attribute from an element, when I cannot access the element?


Solution

  • The following stylesheet will copy all content in the source document - except attributes that contain only whitespace. The first template simply copies everything - including empty attributes. However, the second template has a higher priority than the first due to its use of a predicate, which is why it will be chosen in preference to the more general first template when an empty attribute is encountered: and this second template does not generate any output.

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
      <xsl:template match="@*[normalize-space()='']"/>
    </xsl:stylesheet>